There are a couple of how too's about this but briefly it
Say you go a bed file of reads of "whatever you want to get the
coverage of" ... the bed file is a dataframe with start, end ,and
chromosome info:
Otherwise you can use ShortRead to get the RangeData object of reads...
whatever you a a rangeList or rangeData object of what you are
counting..
Turn the bed file, (as a dataframe) into a Ranges Data object: You can
use a RangeList if you want ...
rl<-RangedData(IRanges(start=bed[,"start"],end=bed[,"end"]),space=bed[,"chr"])
________________________________________
So this has chromosomes in the space section..
You the want coverage on a set of "other regions".... say the exons,
chromosomes, Cpg islands ... whatever:
Eg for exons:
library(GenomicFeatures.Mmusculus.UCSC.mm9)
library(org.Mm.eg.db)
genes<-geneMouse()
the.exons<-exons_deprecated(genes)
"the.exons" is not a ranged data object: (a RangeList can also be used)
of UCSC exons or name you own..
the.exons<- RangesList(
chr10 =
IRanges(start=starts[starts[,"chrom"]=="chr10","txStart"],end=starts[starts[,"chrom"]=="chr10","txStart"]),
chr11 =
IRanges(start=starts[starts[,"chrom"]=="chr11","txStart"],end=starts[starts[,"chrom"]=="chr11","txStart"]),
chr12 =
IRanges(start=starts[starts[,"chrom"]=="chr12","txStart"],end=starts[starts[,"chrom"]=="chr12","txStart"]),
....)
so in summary the.exons is a rangeData or Rangelist objuct of "the
regions you want the coverage on" (al las start and end of
lapply(ranges(RangedData),coverage, start = 1, end = 1000)
________________________________________-
## do coverage calculation
library(BSgenome.Mmusculus.UCSC.mm9)
mouse.chromlens<-seqlengths(Mmusculus)
Get a rleList objust of the coverage over the genome:
cov<-coverage(rl,width=mouse.chromlens) ## list names are the chromosome
names/spcae names in rl...you don't have to use "width"
##
cov<-cov[names(the.exons)] # make sure list in the same order can this not
checked in the line below please check the chromosome names are consistentin
rl and the.exons
you now have the coverge on all the chromosomes.... but for other specified
regions use:
regionViews <- RleViewsList(rleList = cov, rangesList = ranges(the.exons)) ###
## your done!
____________________________________________
##dumP the reults as a dataframe::
the.counts<-{}
order.chromos<-names(the.exons)
for (i in 1:length(order.chromos)){
chromo<-order.chromos[i]
the.counts<-rbind(the.counts,cbind(chromo,start(the.exons[chromo]),end(the.exons[chromo]),
viewSums(regionViews[[chromo]]) ))
}
colnames(the.counts) <- c("chr","start","end","count")
Cheers
Paul
-----Original Message-----
From: Jane Landolin <[email protected]>
To: [email protected]
Subject: [Bioc-sig-seq] Coverage for RangeList?
Date: Mon, 19 Apr 2010 16:40:54 -0700
Hi all,
I just joined the list, and was hoping to get some advice about the
best way to work with the various classes that deal with genomic range
info. I am a postdoc in the Celniker Lab at Lawrence Berkeley National
Lab, working on the modENCODE project.
So my question has to do with using various functions that operate on
the IRanges level, while keeping my data in RangedData or RangeList
objects. I should be able to use rdapply or lapply, but it is not
very clear to me when I should use one over another.
If I want to calculate the chromosome-by-chromosome coverage for a
RangeData object, I can do something like:
lapply(ranges(RangedData),coverage, start = 1, end = 1000)
but how can I do it for varying starts and ends?
Alternatively, is it better to write a function and call it using
rdapply? What have you guys been doing for this scenario?
Thanks in advance,
Jane Landolin
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing