On 03/31/2011 12:49 PM, Cook, Malcolm wrote:
Martin,
Hmmm, Thanks .... I think I'm getting it....
Following your lead, I can directly re-order cnt in the OriginalOrder as:
cnt[sort(unlist(split(values(which2),
seqnames(which2)))$OriginalOrder,index.return=TRUE)$ix,]
space start end width file records nucleotides
2 seq2 100 1000 901 ex1.bam 1169 41235
1 seq1 1000 2000 1001 ex1.bam 612 21549
3 seq2 1000 2000 1001 ex1.bam 642 22640
... which can usefully(?) be abstracted as
countBamWhich<- function (file,which,index=file,...) {
### wrapper for countBam that reorders its results to aggree with
### 'which', a required ScanBamParam. ... params are additional
### ScanBamParam options.
###
### ASSUMES: internal implementation detail of ScanBamParam. c.f.
###
http://permalink.gmane.org/gmane.science.biology.informatics.conductor/34208)
param<- ScanBamParam(which=which,...)
values(which)[['OriginalOrder']]<- 1:length(which)
CBW = countBam(file,index,param=ScanBamParam(which=which))
CBW[sort(unlist(split(values(which),
seqnames(which)))$OriginalOrder,index.return=TRUE)$ix,]
}
Hi Malcolm -- maybe the return value is a GRanges with counts? It's
probably better to split a simple vector and use that to impose order on
the result, rather than splitting values(). The return value could then
be cbind'ed as needed...
countBam_GRanges <-
function(file, index=file, ..., param=GRanges())
{
if (0L == length(param))
stop("'param' must have length() != 0")
## FIXME: not sure about 'drop' argument to split?
splt <- split(seq_len(length(param)), as.factor(seqnames(param)))
o <- order(unlist(splt, use.names=FALSE))
cnt <- countBam(file, index, ..., param=ScanBamParam(which=param))
df <- as(cnt[o, c("records", "nucleotides")], "DataFrame")
values(param) <- df
param
}
counts <- countBam_GRanges(fl, param=which1)
values(which1) <- cbind(values(which1), values(counts))
Martin
allowing me to write
countBamWhich(fl, which2)
space start end width file records nucleotides
2 seq2 100 1000 901 ex1.bam 1169 41235
1 seq1 1000 2000 1001 ex1.bam 612 21549
3 seq2 1000 2000 1001 ex1.bam 642 22640
All in favor?
~ Malcolm
-----Original Message-----
From: Martin Morgan [mailto:[email protected]]
Sent: Thursday, March 31, 2011 11:41 AM
To: Cook, Malcolm
Cc: '[email protected]'; '[email protected]';
'[email protected]'
Subject: Re: [BioC] Re(surrecting): [Bioc-sig-seq] Rsamtools
countBam labeling
On 03/31/2011 08:32 AM, Cook, Malcolm wrote:
Matt et. al.,
I wonder if a satisfactory resolution to the issue of "the order
changes between the GRanges object and the countBam data.frame
http://www.mail-archive.com/[email protected]/msg01144
.html
I am presented with the same issue and poised to tackle it
but wondr
if a generic solution emerged from you inquiries& efforts.
Hi Malcolm --
For a reproducible example,
library(Rsamtools)
example(countBam)
which1<- as(which, "GRanges")
## which2 might be where your data actually starts
which2<- which1[c(2,1,3)]
values(which2)[["OriginalOrder"]]<- 1:3
param<- ScanBamParam(which=which2)
cnt<- countBam(fl, param=param)
What happens is that ScanBamParam converts its argument to an
IRangesList, using split(ranges(which2), seqnames(which2)).
So do the same for the values and unlist
cntVals<- unlist(split(values(which2), seqnames(which2)))
then cbind coerced values
cbind(cnt, as.data.frame(cntVals))
with
> which2
GRanges with 3 ranges and 1 elementMetadata value
seqnames ranges strand | OriginalOrder
<Rle> <IRanges> <Rle> |<integer>
[1] seq2 [ 100, 1000] * | 1
[2] seq1 [1000, 2000] * | 2
[3] seq2 [1000, 2000] * | 3
seqlengths
seq1 seq2
NA NA
> cbind(cnt, as.data.frame(cntVals))
space start end width file records nucleotides OriginalOrder
1 seq1 1000 2000 1001 ex1.bam 612 21549 2
2 seq2 100 1000 901 ex1.bam 1169 41235 1
3 seq2 1000 2000 1001 ex1.bam 642 22640 3
Martin
Thanks,
Malcolm Cook Stowers Institute for Medical Research -
Bioinformatics
Kansas City, Missouri USA
_______________________________________________
Bioconductor mailing
list [email protected]
https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing