On Wed, Aug 19, 2009 at 5:11 AM, Simon Anders <[email protected]> wrote:
> Hi Florence > > I CC this mail to the bioc-sig-sequencing mailing list, as this is there > such stuff is discussed, and your problem might be of interest for a few > more people. > > Florence Cavalli wrote: > > How are you? Do you enjoy being in Heidelberg so far? Hope so! > > Thanks, I am slowly settling in. I still dont' have a flat and live in the > Boxberg guest house but that's fine for the moment. > > Can I ask you a question? It may be quite simple but I am a bit >> confuse with different classes >> >> I detect some peaks and would like to create a wig file to look at >> them in a browser. >> >> cov_polII_wt >>> >> A GenomeData instance >> chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet >> >>> cov_polII_wt[["X"]] >>> >> 'integer' Rle instance of length 22422250 with 1066959 runs >> Lengths: 27 60 31 32 27 13 47 8 23 72 ... >> Values : 1 2 3 4 3 2 3 2 3 2 ... >> >>> polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8) >>> polII_wt_peaks_X >>> >> Views on a 22422250-length Rle subject >> >> views: >> start end width >> [1] 2679 2927 249 [ 8 8 8 9 9 9 9 9 9 9 9 9 9 >> ...] >> [2] 4300 4381 82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 >> ...] >> [3] 5581 5837 257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 >> ...] >> [4] 5885 6033 149 [ 9 9 9 9 9 10 10 10 10 10 10 10 10 >> ...] >> [5] 6878 7020 143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 >> ...] >> [6] 7025 7070 46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8 >> ...] >> [7] 7555 7704 150 [ 8 8 8 8 8 8 8 8 9 9 9 9 9 >> ...] >> [8] 7795 8022 228 [ 8 8 8 8 8 8 8 8 9 9 10 10 10 >> ...] >> [9] 8332 8338 7 [8 8 8 8 8 8 8] >> ... ... ... ... ... >> [18252] 22414407 22414570 164 [ 8 8 8 8 9 9 9 9 9 9 9 9 9 >> ...] >> [18253] 22414629 22414639 11 [8 8 8 8 8 8 8 8 8 8 8] >> [18254] 22414655 22414657 3 [8 8 8] >> [18255] 22414678 22414778 101 [ 8 8 8 8 8 8 9 9 9 9 9 10 10 >> ...] >> [18256] 22414792 22415612 821 [ 8 8 8 8 8 8 8 8 9 9 9 9 9 >> ...] >> [18257] 22415635 22415650 16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8] >> [18258] 22415728 22415737 10 [9 9 8 8 8 8 8 8 8 8] >> [18259] 22415749 22415784 36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 >> ...] >> [18260] 22415830 22415871 42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 >> ...] >> >> >> From your workshop I did before: >>> >> cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData") >> names(cov_polII_wt_X_rd)="X" >> export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig") >> >> How could I do the same for the peaks I have in polII_wt_peaks_X (or >> the peaks on all chromosomes) ? >> I don't know how to convert a RleViews to a Rle. >> > > An RleViews object is a list of "views" into an Rle object. "View" means > that it is still the same dat but you just look a a small portion of the > whole vector. > > To visualize the peaks, I see three possibilities: > > (a) You assemble the RleViews object with all the peaks into one big Rle > object. This is essentially the same as taking the original Rle object and > setting all those places to 0 that are below the threshold. > > My first attempt would have been to write > > rleobj <- cov_polII_wt[["X"]] > rleobj[ rleobj < 8 ] <- 0 > > but this does not work as the Rle class does not implement `[<-`. > > In this specific case, the boundaries of the runs will not change, so we > can use the `value<-` method which is provided: > > rleobj <- cov_polII_wt[["X"]] > values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) ) > > > (b) Maybe setting the background to zero is not the best option anyway. It > might be nicer to have a BED file that indicates the peaks alongside the > wiggle file with the coverage values and display both in adjacent tracks in > the genome browser. A BED file is essentially a tab-seperated values file > with at least the columns chromosome, start and end. You could make a data > frame out of the 'start' and 'end' values from the Rle object and write it > out with write.table. You might want to add the peak area as another column. > (See here for the correct order of the columns: > http://genome.ucsc.edu/FAQ/FAQformat ) > > I am not sure on how to use rtracklayer's 'export' function for this. > Michael, do you have any comment? > The RleViews object is a Ranges object, so if you want a BED file of those ranges, it's: gd <- GenomicData(polII_wt_peaks_X) export(gd, "peaks.bed") Or if you want the peak max to be the score (translated to gray scale at UCSC): gd <- GenomicData(polII_wt_peaks_X, score = viewMaxs(poIII_wt_peaks_X)) export(gd, "peaks.bed") Note that you can do this for all chromosomes fairly easily: polII_wt_peaks <- gdapply(cov_polII_wt, slice) # another GenomeData export(as(polll_wt_peaks, "RangedData"), "peaks.bed") I thought there might be a slice method for "GenomeData" that returned an RleViewsList, but I could not find one. The same export line would work in that case. > > (c) Finally, for your problem, there is a very easy solution. If you use > the Integrated Genome Browser (IGB) to view yor data, you will see that it > has a thresholding function. If you set it to a value, it marks all > stretches of a track where the value exceeds the threshold. > > The other thing is: if I use the R-2.10.1 version it does not >> recognise the GenomeData object: >> >>> cov_polII_wt >>> >> A GenomeData instanceError in .local(x, ...) : >> no slot of name "metadata" for this object of class "GenomeData" >> >> Do you know why? >> I would like to use the library chipseq, so maybe I should recreate >> the cov_polII_wt under R-2.10.0???? >> >> > There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you > can have problems with objects created with R-2.9 and saved from there if > you try to load them in R-2.10. Strictly speaking, the version of R does not > matter but if you use R-2.9, biocLite takes the packages from Bioconductor > 2.4 and, in R-2.10, it takes them from Bioc 2.5, which is still under > development and keeps changing, so that you have to expect that objects > become incompatible. > > If you want to use the chipseq package you should do everything with > R-2.10. > > sessionInfo() >>> >> R version 2.9.1 (2009-06-26) >> x86_64-unknown-linux-gnu >> >> locale: >> >> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rtracklayer_1.4.1 RCurl_0.98-1 bitops_1.0-4.1 ShortRead_1.2.1 >> [5] lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.7 IRanges_1.2.3 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1 tools_2.9.1 XML_2.5-3 >> > > Cheers > Simon > > > +--- > | Dr. Simon Anders, Dipl.-Phys. > | European Molecular Biology Laboratory (EMBL), Heidelberg > | office phone +49-6221-387-8632 > | preferred (permanent) e-mail: [email protected] > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
