Florence,
I just made three changes to the BioC 2.5 sequence infrastructure that should
make this process much more straight-forward.
Although you didn't state this in your e-mail, I am guessing that your
cov_polII_wt object was created by using ShortRead's coverage method on an
AlignedRead object (also created by ShortRead). The GenomeData class was
created at a time when most of the sequence infrastructure was in flux and like
scaffolding used in bricks and mortar work should probably be dismantled.
The first change I made was to have the coverage method for AlignedRead to
return a SimpleRleList object. This object class is more functional than
GenomeData with methods for functions such as slice. The second change I made
was to make rtracklayer's export.* functions coerce object to class RangedData
when there is no other available method for that object type. The third change
I made was add another coercion RleList to RangedData coercion method to
support this workflow.
Here is what the new workflow looks like:
# load packages
suppressMessages(library(ShortRead))
suppressMessages(library(rtracklayer))
# read alignments
dirPath <- system.file('extdata', 'maq', package='ShortRead')
aln <- readAligned(dirPath, 'out.aln.1.txt', type="MAQMapview")
# create coverage across chromosomes/contigs
cov_polII_wt <- coverage(aln)
names(cov_polII_wt) <- tolower(names(cov_polII_wt))
cov_polII_wt
SimpleRleList: 1 element
names(1): chra
# find peaks
peaks_polII_wt <- slice(cov_polII_wt,8)
peaks_polII_wt
SimpleRleViewsList: 1 element
names(1): chra
# export to UCSC file formats
export(cov_polII_wt,"cov_polII_wt.wig")
export(peaks_polII_wt,"peaks_polII_wt.bed")
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-05 r49073)
i386-apple-darwin9.7.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.5.12 RCurl_0.98-1 bitops_1.0-4.1 ShortRead_1.3.26
[5] lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.32 IRanges_1.3.57
loaded via a namespace (and not attached):
[1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 XML_2.6-0
Michael Lawrence wrote:
On Wed, Aug 19, 2009 at 5:11 AM, Simon Anders <[email protected]> wrote:
Hi Florence
I CC this mail to the bioc-sig-sequencing mailing list, as this is there
such stuff is discussed, and your problem might be of interest for a few
more people.
Florence Cavalli wrote:
How are you? Do you enjoy being in Heidelberg so far? Hope so!
Thanks, I am slowly settling in. I still dont' have a flat and live in the
Boxberg guest house but that's fine for the moment.
Can I ask you a question? It may be quite simple but I am a bit
confuse with different classes
I detect some peaks and would like to create a wig file to look at
them in a browser.
cov_polII_wt
A GenomeData instance
chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet
cov_polII_wt[["X"]]
'integer' Rle instance of length 22422250 with 1066959 runs
Lengths: 27 60 31 32 27 13 47 8 23 72 ...
Values : 1 2 3 4 3 2 3 2 3 2 ...
polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8)
polII_wt_peaks_X
Views on a 22422250-length Rle subject
views:
start end width
[1] 2679 2927 249 [ 8 8 8 9 9 9 9 9 9 9 9 9 9
...]
[2] 4300 4381 82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
...]
[3] 5581 5837 257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9
...]
[4] 5885 6033 149 [ 9 9 9 9 9 10 10 10 10 10 10 10 10
...]
[5] 6878 7020 143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9
...]
[6] 7025 7070 46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8
...]
[7] 7555 7704 150 [ 8 8 8 8 8 8 8 8 9 9 9 9 9
...]
[8] 7795 8022 228 [ 8 8 8 8 8 8 8 8 9 9 10 10 10
...]
[9] 8332 8338 7 [8 8 8 8 8 8 8]
... ... ... ... ...
[18252] 22414407 22414570 164 [ 8 8 8 8 9 9 9 9 9 9 9 9 9
...]
[18253] 22414629 22414639 11 [8 8 8 8 8 8 8 8 8 8 8]
[18254] 22414655 22414657 3 [8 8 8]
[18255] 22414678 22414778 101 [ 8 8 8 8 8 8 9 9 9 9 9 10 10
...]
[18256] 22414792 22415612 821 [ 8 8 8 8 8 8 8 8 9 9 9 9 9
...]
[18257] 22415635 22415650 16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8]
[18258] 22415728 22415737 10 [9 9 8 8 8 8 8 8 8 8]
[18259] 22415749 22415784 36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
...]
[18260] 22415830 22415871 42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9
...]
From your workshop I did before:
cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData")
names(cov_polII_wt_X_rd)="X"
export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig")
How could I do the same for the peaks I have in polII_wt_peaks_X (or
the peaks on all chromosomes) ?
I don't know how to convert a RleViews to a Rle.
An RleViews object is a list of "views" into an Rle object. "View" means
that it is still the same dat but you just look a a small portion of the
whole vector.
To visualize the peaks, I see three possibilities:
(a) You assemble the RleViews object with all the peaks into one big Rle
object. This is essentially the same as taking the original Rle object and
setting all those places to 0 that are below the threshold.
My first attempt would have been to write
rleobj <- cov_polII_wt[["X"]]
rleobj[ rleobj < 8 ] <- 0
but this does not work as the Rle class does not implement `[<-`.
In this specific case, the boundaries of the runs will not change, so we
can use the `value<-` method which is provided:
rleobj <- cov_polII_wt[["X"]]
values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) )
(b) Maybe setting the background to zero is not the best option anyway. It
might be nicer to have a BED file that indicates the peaks alongside the
wiggle file with the coverage values and display both in adjacent tracks in
the genome browser. A BED file is essentially a tab-seperated values file
with at least the columns chromosome, start and end. You could make a data
frame out of the 'start' and 'end' values from the Rle object and write it
out with write.table. You might want to add the peak area as another column.
(See here for the correct order of the columns:
http://genome.ucsc.edu/FAQ/FAQformat )
I am not sure on how to use rtracklayer's 'export' function for this.
Michael, do you have any comment?
The RleViews object is a Ranges object, so if you want a BED file of those
ranges, it's:
gd <- GenomicData(polII_wt_peaks_X)
export(gd, "peaks.bed")
Or if you want the peak max to be the score (translated to gray scale at
UCSC):
gd <- GenomicData(polII_wt_peaks_X, score = viewMaxs(poIII_wt_peaks_X))
export(gd, "peaks.bed")
Note that you can do this for all chromosomes fairly easily:
polII_wt_peaks <- gdapply(cov_polII_wt, slice) # another GenomeData
export(as(polll_wt_peaks, "RangedData"), "peaks.bed")
I thought there might be a slice method for "GenomeData" that returned an
RleViewsList, but I could not find one. The same export line would work in
that case.
(c) Finally, for your problem, there is a very easy solution. If you use
the Integrated Genome Browser (IGB) to view yor data, you will see that it
has a thresholding function. If you set it to a value, it marks all
stretches of a track where the value exceeds the threshold.
The other thing is: if I use the R-2.10.1 version it does not
recognise the GenomeData object:
cov_polII_wt
A GenomeData instanceError in .local(x, ...) :
no slot of name "metadata" for this object of class "GenomeData"
Do you know why?
I would like to use the library chipseq, so maybe I should recreate
the cov_polII_wt under R-2.10.0????
There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you
can have problems with objects created with R-2.9 and saved from there if
you try to load them in R-2.10. Strictly speaking, the version of R does not
matter but if you use R-2.9, biocLite takes the packages from Bioconductor
2.4 and, in R-2.10, it takes them from Bioc 2.5, which is still under
development and keeps changing, so that you have to expect that objects
become incompatible.
If you want to use the chipseq package you should do everything with
R-2.10.
sessionInfo()
R version 2.9.1 (2009-06-26)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.4.1 RCurl_0.98-1 bitops_1.0-4.1 ShortRead_1.2.1
[5] lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.7 IRanges_1.2.3
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1 tools_2.9.1 XML_2.5-3
Cheers
Simon
+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: [email protected]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing