Re: [Bioc-sig-seq] Rle, RleViews

Patrick Aboyoun Wed, 19 Aug 2009 15:48:21 -0700

Florence,
I just made three changes to the BioC 2.5 sequence infrastructure that should 
make this process much more straight-forward.


Although you didn't state this in your e-mail, I am guessing that your 
cov_polII_wt object was created by using ShortRead's coverage method on an 
AlignedRead object (also created by ShortRead). The GenomeData class was 
created at a time when most of the sequence infrastructure was in flux and like 
scaffolding used in bricks and mortar work should probably be dismantled.

The first change I made was to have the coverage method for AlignedRead to 
return a SimpleRleList object. This object class is more functional than 
GenomeData with methods for functions such as slice. The second change I made 
was to make rtracklayer's export.* functions coerce object to class RangedData 
when there is no other available method for that object type. The third change 
I made was add another coercion RleList to RangedData coercion method to 
support this workflow.


Here is what the new workflow looks like:

# load packages
suppressMessages(library(ShortRead))
suppressMessages(library(rtracklayer))

# read alignments
dirPath <- system.file('extdata', 'maq', package='ShortRead')
aln <- readAligned(dirPath, 'out.aln.1.txt', type="MAQMapview")

# create coverage across chromosomes/contigs
cov_polII_wt <- coverage(aln)
names(cov_polII_wt) <- tolower(names(cov_polII_wt))
cov_polII_wt

SimpleRleList: 1 element
names(1): chra

# find peaks
peaks_polII_wt <- slice(cov_polII_wt,8)
peaks_polII_wt

SimpleRleViewsList: 1 element
names(1): chra

# export to UCSC file formats
export(cov_polII_wt,"cov_polII_wt.wig")
export(peaks_polII_wt,"peaks_polII_wt.bed")

sessionInfo()

R version 2.10.0 Under development (unstable) (2009-08-05 r49073)i386-apple-darwin9.7.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] rtracklayer_1.5.12 RCurl_0.98-1 bitops_1.0-4.1 ShortRead_1.3.26[5] lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.32 IRanges_1.3.57

loaded via a namespace (and not attached):

[1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 XML_2.6-0




Michael Lawrence wrote:

On Wed, Aug 19, 2009 at 5:11 AM, Simon Anders <[email protected]> wrote:

Hi Florence

I CC this mail to the bioc-sig-sequencing mailing list, as this is there
such stuff is discussed, and your problem might be of interest for a few
 more people.

Florence Cavalli wrote:

How are you? Do you enjoy being in Heidelberg so far? Hope so!

Thanks, I am slowly settling in. I still dont' have a flat and live in the
Boxberg guest house but that's fine for the moment.

 Can I ask you a question? It may be quite simple but I am a bit

confuse with different classes

I detect some peaks and would like to create a wig file to look at
them in a browser.

 cov_polII_wt

A GenomeData instance

chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet

cov_polII_wt[["X"]]

 'integer' Rle instance of length 22422250 with 1066959 runs
 Lengths:  27 60 31 32 27 13 47 8 23 72 ...
 Values :  1 2 3 4 3 2 3 2 3 2 ...

polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8)
polII_wt_peaks_X

 Views on a 22422250-length Rle subject

views:
          start      end width
   [1]     2679     2927   249 [ 8  8  8  9  9  9  9  9  9  9  9  9  9
...]
   [2]     4300     4381    82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
...]
   [3]     5581     5837   257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9
...]
   [4]     5885     6033   149 [ 9  9  9  9  9 10 10 10 10 10 10 10 10
...]
   [5]     6878     7020   143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9
...]
   [6]     7025     7070    46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8
...]
   [7]     7555     7704   150 [ 8  8  8  8  8  8  8  8  9  9  9  9  9
...]
   [8]     7795     8022   228 [ 8  8  8  8  8  8  8  8  9  9 10 10 10
...]
   [9]     8332     8338     7 [8 8 8 8 8 8 8]
   ...      ...      ...   ... ...
[18252] 22414407 22414570   164 [ 8  8  8  8  9  9  9  9  9  9  9  9  9
...]
[18253] 22414629 22414639    11 [8 8 8 8 8 8 8 8 8 8 8]
[18254] 22414655 22414657     3 [8 8 8]
[18255] 22414678 22414778   101 [ 8  8  8  8  8  8  9  9  9  9  9 10 10
...]
[18256] 22414792 22415612   821 [ 8  8  8  8  8  8  8  8  9  9  9  9  9
...]
[18257] 22415635 22415650    16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8]
[18258] 22415728 22415737    10 [9 9 8 8 8 8 8 8 8 8]
[18259] 22415749 22415784    36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
...]
[18260] 22415830 22415871    42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9
...]


 From your workshop I did before:

cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData")

names(cov_polII_wt_X_rd)="X"
export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig")

How could I do the same for the peaks I have in polII_wt_peaks_X (or
the peaks on all chromosomes) ?
I don't know how to convert a RleViews to a Rle.

An RleViews object is a list of "views" into an Rle object. "View" means
that it is still the same dat but you just look a a small portion of the
whole vector.

To visualize the peaks, I see three possibilities:

(a) You assemble the RleViews object with all the peaks into one big Rle
object. This is essentially the same as taking the original Rle object and
setting all those places to 0 that are below the threshold.

My first attempt would have been to write

  rleobj <- cov_polII_wt[["X"]]
  rleobj[ rleobj < 8 ] <- 0

but this does not work as the Rle class does not implement `[<-`.

In this specific case, the boundaries of the runs will not change, so we
can use the `value<-` method which is provided:

  rleobj <- cov_polII_wt[["X"]]
  values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) )


(b) Maybe setting the background to zero is not the best option anyway. It
might be nicer to have a BED file that indicates the peaks alongside the
wiggle file with the coverage values and display both in adjacent tracks in
the genome browser. A BED file is essentially a tab-seperated values file
with at least the columns chromosome, start and end. You could make a data
frame out of the 'start' and 'end' values from the Rle object and write it
out with write.table. You might want to add the peak area as another column.
(See here for the correct order of the columns:
http://genome.ucsc.edu/FAQ/FAQformat )

I am not sure on how to use rtracklayer's 'export' function for this.
Michael, do you have any comment?


The RleViews object is a Ranges object, so if you want a BED file of those
ranges, it's:
gd <- GenomicData(polII_wt_peaks_X)
export(gd, "peaks.bed")

Or if you want the peak max to be the score (translated to gray scale at
UCSC):
gd <- GenomicData(polII_wt_peaks_X, score = viewMaxs(poIII_wt_peaks_X))
export(gd, "peaks.bed")

Note that you can do this for all chromosomes fairly easily:

polII_wt_peaks <- gdapply(cov_polII_wt, slice) # another GenomeData
export(as(polll_wt_peaks, "RangedData"), "peaks.bed")

I thought there might be a slice method for "GenomeData" that returned an
RleViewsList, but I could not find one. The same export line would work in
that case.

(c) Finally, for your problem, there is a very easy solution. If you use
the Integrated Genome Browser (IGB) to view yor data, you will see that it
has a thresholding function. If you set it to a value, it marks all
stretches of a track where the value exceeds the threshold.

 The other thing is: if I use the R-2.10.1 version it does not

recognise the GenomeData object:

cov_polII_wt

A GenomeData instanceError in .local(x, ...) :
 no slot of name "metadata" for this object of class "GenomeData"

Do you know why?
I would like to use the library chipseq, so maybe I should recreate
the cov_polII_wt under R-2.10.0????

There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you
can have problems with objects created with R-2.9 and saved from there if
you try to load them in R-2.10. Strictly speaking, the version of R does not
matter but if you use R-2.9, biocLite takes the packages from Bioconductor
2.4 and, in R-2.10, it takes them from Bioc 2.5, which is still under
development and keeps changing, so that you have to expect that objects
become incompatible.

If you want to use the chipseq package you should do everything with
R-2.10.

 sessionInfo()

R version 2.9.1 (2009-06-26)
x86_64-unknown-linux-gnu

locale:

LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rtracklayer_1.4.1 RCurl_0.98-1      bitops_1.0-4.1    ShortRead_1.2.1
[5] lattice_0.17-25   BSgenome_1.12.3   Biostrings_2.12.7 IRanges_1.2.3

loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.1    hwriter_1.1   tools_2.9.1   XML_2.5-3

Cheers
 Simon


+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: [email protected]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Rle, RleViews

Reply via email to