On 03/31/2011 12:49 PM, Cook, Malcolm wrote:
Martin,

Hmmm, Thanks .... I think I'm getting it....

Following your lead, I can directly re-order cnt in the OriginalOrder as:

cnt[sort(unlist(split(values(which2), 
seqnames(which2)))$OriginalOrder,index.return=TRUE)$ix,]
   space start  end width    file records nucleotides
2  seq2   100 1000   901 ex1.bam    1169       41235
1  seq1  1000 2000  1001 ex1.bam     612       21549
3  seq2  1000 2000  1001 ex1.bam     642       22640


... which can usefully(?) be abstracted as

countBamWhich<- function (file,which,index=file,...) {
### wrapper for countBam that reorders its results to aggree with
### 'which', a required ScanBamParam.  ... params are additional
### ScanBamParam options.
###
### ASSUMES: internal implementation detail of ScanBamParam. c.f.
### 
http://permalink.gmane.org/gmane.science.biology.informatics.conductor/34208)
   param<- ScanBamParam(which=which,...)
   values(which)[['OriginalOrder']]<- 1:length(which)
   CBW = countBam(file,index,param=ScanBamParam(which=which))
   CBW[sort(unlist(split(values(which),
                         seqnames(which)))$OriginalOrder,index.return=TRUE)$ix,]
}

Hi Malcolm -- maybe the return value is a GRanges with counts? It's probably better to split a simple vector and use that to impose order on the result, rather than splitting values(). The return value could then be cbind'ed as needed...

countBam_GRanges <-
    function(file, index=file, ..., param=GRanges())
{
    if (0L == length(param))
        stop("'param' must have length() != 0")
    ## FIXME: not sure about 'drop' argument to split?
    splt <- split(seq_len(length(param)), as.factor(seqnames(param)))
    o <- order(unlist(splt, use.names=FALSE))
    cnt <- countBam(file, index, ..., param=ScanBamParam(which=param))
    df <- as(cnt[o, c("records", "nucleotides")], "DataFrame")
    values(param) <- df
    param
}

counts <- countBam_GRanges(fl, param=which1)
values(which1) <- cbind(values(which1), values(counts))

Martin


allowing me to write

countBamWhich(fl, which2)
   space start  end width    file records nucleotides
2  seq2   100 1000   901 ex1.bam    1169       41235
1  seq1  1000 2000  1001 ex1.bam     612       21549
3  seq2  1000 2000  1001 ex1.bam     642       22640

All in favor?

~ Malcolm




-----Original Message-----
From: Martin Morgan [mailto:[email protected]]
Sent: Thursday, March 31, 2011 11:41 AM
To: Cook, Malcolm
Cc: '[email protected]'; '[email protected]';
'[email protected]'
Subject: Re: [BioC] Re(surrecting): [Bioc-sig-seq] Rsamtools
countBam labeling

On 03/31/2011 08:32 AM, Cook, Malcolm wrote:
Matt et. al.,

I wonder if a satisfactory resolution to the issue of "the order
changes between the GRanges object and the countBam data.frame


http://www.mail-archive.com/[email protected]/msg01144
.html

  I am presented with the same issue and poised to tackle it
but wondr
if a generic solution emerged from you inquiries&   efforts.

Hi Malcolm --

For a reproducible example,

    library(Rsamtools)
    example(countBam)
    which1<- as(which, "GRanges")
    ## which2 might be where your data actually starts
    which2<- which1[c(2,1,3)]
    values(which2)[["OriginalOrder"]]<- 1:3
    param<- ScanBamParam(which=which2)
    cnt<- countBam(fl, param=param)

What happens is that ScanBamParam converts its argument to an
IRangesList, using split(ranges(which2), seqnames(which2)).
So do the same for the values and unlist

    cntVals<- unlist(split(values(which2), seqnames(which2)))

then cbind coerced values

    cbind(cnt, as.data.frame(cntVals))

with

  >  which2
GRanges with 3 ranges and 1 elementMetadata value
      seqnames       ranges strand | OriginalOrder
         <Rle>     <IRanges>   <Rle>  |<integer>
[1]     seq2 [ 100, 1000]      * |             1
[2]     seq1 [1000, 2000]      * |             2
[3]     seq2 [1000, 2000]      * |             3

seqlengths
   seq1 seq2
     NA   NA
  >  cbind(cnt, as.data.frame(cntVals))
    space start  end width    file records nucleotides OriginalOrder
1  seq1  1000 2000  1001 ex1.bam     612       21549             2
2  seq2   100 1000   901 ex1.bam    1169       41235             1
3  seq2  1000 2000  1001 ex1.bam     642       22640             3

Martin


Thanks,

Malcolm Cook Stowers Institute for Medical Research -
Bioinformatics
Kansas City, Missouri  USA


_______________________________________________
Bioconductor mailing
list [email protected]
https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to