Jane,
In BioC 2.5 (release) / R 2.10 and soon to be released BioC 2.6 / R 2.11, the man page for coverage in the IRanges package describes the shift and width arguments, which allow you to vary the bounds for each chromosome/contig within the GRanges/RangesList/RangedData object. As Paul mentioned, the source of the data can influence how you choose to represent your data so it is tricky to speak in generalities. If you provide a workflow begining with the data source, we can provide you code that should work for you.


Cheers,
Patrick


On 4/19/10 6:54 PM, Paul Leo wrote:

There are a couple of how too's about this but briefly it

Say you go a bed file of reads  of "whatever you want to get the
coverage of" ... the bed file is a dataframe with start, end ,and
chromosome info:
Otherwise you can use ShortRead to get the RangeData object of reads...
whatever you a a rangeList or rangeData object of what you are
counting..


Turn the bed file, (as a dataframe) into a Ranges Data object: You can
use a RangeList if you want ...


rl<-RangedData(IRanges(start=bed[,"start"],end=bed[,"end"]),space=bed[,"chr"])
________________________________________

So this has chromosomes in the space section..

You the want coverage on a set of "other regions".... say the exons,
chromosomes, Cpg islands ... whatever:

Eg for exons:
library(GenomicFeatures.Mmusculus.UCSC.mm9)
library(org.Mm.eg.db)

genes<-geneMouse()
the.exons<-exons_deprecated(genes)

"the.exons" is not a ranged data object: (a RangeList can also be used)
of UCSC exons or name you own..

the.exons<- RangesList(
chr10 =
IRanges(start=starts[starts[,"chrom"]=="chr10","txStart"],end=starts[starts[,"chrom"]=="chr10","txStart"]),
chr11  =
IRanges(start=starts[starts[,"chrom"]=="chr11","txStart"],end=starts[starts[,"chrom"]=="chr11","txStart"]),
chr12  =
IRanges(start=starts[starts[,"chrom"]=="chr12","txStart"],end=starts[starts[,"chrom"]=="chr12","txStart"]),
....)

so in summary the.exons is a rangeData or Rangelist objuct of "the
regions you want the coverage on" (al las  start and end of
lapply(ranges(RangedData),coverage, start = 1, end = 1000)


________________________________________-

## do coverage calculation

library(BSgenome.Mmusculus.UCSC.mm9)
mouse.chromlens<-seqlengths(Mmusculus)

Get a rleList objust of the coverage over the genome:

cov<-coverage(rl,width=mouse.chromlens) ## list names are the chromosome
names/spcae names in rl...you don't have to use "width"
##

cov<-cov[names(the.exons)] # make sure list in the same order can this not 
checked in the  line below please check the chromosome names are consistentin rl 
and the.exons

you now have the coverge on all the chromosomes.... but for other specified 
regions use:


regionViews<- RleViewsList(rleList = cov, rangesList = ranges(the.exons))  ###


## your done!
____________________________________________

##dumP the reults as a dataframe::
the.counts<-{}
order.chromos<-names(the.exons)
for (i in 1:length(order.chromos)){
   chromo<-order.chromos[i]
   
the.counts<-rbind(the.counts,cbind(chromo,start(the.exons[chromo]),end(the.exons[chromo]),
 viewSums(regionViews[[chromo]]) ))
}
colnames(the.counts)<- c("chr","start","end","count")

Cheers
Paul





-----Original Message-----
From: Jane Landolin<[email protected]>
To: [email protected]
Subject: [Bioc-sig-seq] Coverage for RangeList?
Date: Mon, 19 Apr 2010 16:40:54 -0700


Hi all,

I just joined the list, and was hoping to get some advice about the
best way to work with the various classes that deal with genomic range
info. I am a postdoc in the Celniker Lab at Lawrence Berkeley National
Lab, working on the modENCODE project.

So my question has to do with using various functions that operate on
the IRanges level, while keeping my data in RangedData or RangeList
objects.  I should be able to use rdapply or lapply, but it is not
very clear to me when I should use one over another.

If I want to calculate the chromosome-by-chromosome coverage for a
RangeData object, I can do something like:

lapply(ranges(RangedData),coverage, start = 1, end = 1000)

but how can I do it for varying starts and ends?

Alternatively, is it better to write a function and call it using
rdapply?  What have you guys been doing for this scenario?


Thanks in advance,
Jane Landolin

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to