On Wed, Aug 19, 2009 at 5:11 AM, Simon Anders <[email protected]> wrote:

> Hi Florence
>
> I CC this mail to the bioc-sig-sequencing mailing list, as this is there
> such stuff is discussed, and your problem might be of interest for a few
>  more people.
>
> Florence Cavalli wrote:
> > How are you? Do you enjoy being in Heidelberg so far? Hope so!
>
> Thanks, I am slowly settling in. I still dont' have a flat and live in the
> Boxberg guest house but that's fine for the moment.
>
>  Can I ask you a question? It may be quite simple but I am a bit
>> confuse with different classes
>>
>> I detect some peaks and would like to create a wig file to look at
>> them in a browser.
>>
>>  cov_polII_wt
>>>
>> A GenomeData instance
>> chromosomes(15): 2L 2LHet 2R 2RHet 3L 3LHet ... U Uextra X XHet YHet
>>
>>> cov_polII_wt[["X"]]
>>>
>>  'integer' Rle instance of length 22422250 with 1066959 runs
>>  Lengths:  27 60 31 32 27 13 47 8 23 72 ...
>>  Values :  1 2 3 4 3 2 3 2 3 2 ...
>>
>>> polII_wt_peaks_X=slice(cov_polII_wt[["X"]],8)
>>> polII_wt_peaks_X
>>>
>>  Views on a 22422250-length Rle subject
>>
>> views:
>>           start      end width
>>    [1]     2679     2927   249 [ 8  8  8  9  9  9  9  9  9  9  9  9  9
>> ...]
>>    [2]     4300     4381    82 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
>> ...]
>>    [3]     5581     5837   257 [8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9
>> ...]
>>    [4]     5885     6033   149 [ 9  9  9  9  9 10 10 10 10 10 10 10 10
>> ...]
>>    [5]     6878     7020   143 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9
>> ...]
>>    [6]     7025     7070    46 [8 8 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 8 8 8
>> ...]
>>    [7]     7555     7704   150 [ 8  8  8  8  8  8  8  8  9  9  9  9  9
>> ...]
>>    [8]     7795     8022   228 [ 8  8  8  8  8  8  8  8  9  9 10 10 10
>> ...]
>>    [9]     8332     8338     7 [8 8 8 8 8 8 8]
>>    ...      ...      ...   ... ...
>> [18252] 22414407 22414570   164 [ 8  8  8  8  9  9  9  9  9  9  9  9  9
>> ...]
>> [18253] 22414629 22414639    11 [8 8 8 8 8 8 8 8 8 8 8]
>> [18254] 22414655 22414657     3 [8 8 8]
>> [18255] 22414678 22414778   101 [ 8  8  8  8  8  8  9  9  9  9  9 10 10
>> ...]
>> [18256] 22414792 22415612   821 [ 8  8  8  8  8  8  8  8  9  9  9  9  9
>> ...]
>> [18257] 22415635 22415650    16 [9 9 9 9 8 8 8 8 8 8 8 8 8 8 8 8]
>> [18258] 22415728 22415737    10 [9 9 8 8 8 8 8 8 8 8]
>> [18259] 22415749 22415784    36 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
>> ...]
>> [18260] 22415830 22415871    42 [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9
>> ...]
>>
>>
>>  From your workshop I did before:
>>>
>> cov_polII_wt_X_rd=as(cov_polII_wt[["X"]],"RangedData")
>> names(cov_polII_wt_X_rd)="X"
>> export(cov_polII_wt_X_rd,"X_polII_wt.wig",format="wig")
>>
>> How could I do the same for the peaks I have in polII_wt_peaks_X (or
>> the peaks on all chromosomes) ?
>> I don't know how to convert a RleViews to a Rle.
>>
>
> An RleViews object is a list of "views" into an Rle object. "View" means
> that it is still the same dat but you just look a a small portion of the
> whole vector.
>
> To visualize the peaks, I see three possibilities:
>
> (a) You assemble the RleViews object with all the peaks into one big Rle
> object. This is essentially the same as taking the original Rle object and
> setting all those places to 0 that are below the threshold.
>
> My first attempt would have been to write
>
>   rleobj <- cov_polII_wt[["X"]]
>   rleobj[ rleobj < 8 ] <- 0
>
> but this does not work as the Rle class does not implement `[<-`.
>
> In this specific case, the boundaries of the runs will not change, so we
> can use the `value<-` method which is provided:
>
>   rleobj <- cov_polII_wt[["X"]]
>   values(rleobj) <- ifelse( values(rleobj) < 8, 0, values(rleobj) )
>
>
> (b) Maybe setting the background to zero is not the best option anyway. It
> might be nicer to have a BED file that indicates the peaks alongside the
> wiggle file with the coverage values and display both in adjacent tracks in
> the genome browser. A BED file is essentially a tab-seperated values file
> with at least the columns chromosome, start and end. You could make a data
> frame out of the 'start' and 'end' values from the Rle object and write it
> out with write.table. You might want to add the peak area as another column.
> (See here for the correct order of the columns:
> http://genome.ucsc.edu/FAQ/FAQformat )
>
> I am not sure on how to use rtracklayer's 'export' function for this.
> Michael, do you have any comment?
>

The RleViews object is a Ranges object, so if you want a BED file of those
ranges, it's:
gd <- GenomicData(polII_wt_peaks_X)
export(gd, "peaks.bed")

Or if you want the peak max to be the score (translated to gray scale at
UCSC):
gd <- GenomicData(polII_wt_peaks_X, score = viewMaxs(poIII_wt_peaks_X))
export(gd, "peaks.bed")

Note that you can do this for all chromosomes fairly easily:

polII_wt_peaks <- gdapply(cov_polII_wt, slice) # another GenomeData
export(as(polll_wt_peaks, "RangedData"), "peaks.bed")

I thought there might be a slice method for "GenomeData" that returned an
RleViewsList, but I could not find one. The same export line would work in
that case.


>
> (c) Finally, for your problem, there is a very easy solution. If you use
> the Integrated Genome Browser (IGB) to view yor data, you will see that it
> has a thresholding function. If you set it to a value, it marks all
> stretches of a track where the value exceeds the threshold.
>
>  The other thing is: if I use the R-2.10.1 version it does not
>> recognise the GenomeData object:
>>
>>> cov_polII_wt
>>>
>> A GenomeData instanceError in .local(x, ...) :
>>  no slot of name "metadata" for this object of class "GenomeData"
>>
>> Do you know why?
>> I would like to use the library chipseq, so maybe I should recreate
>> the cov_polII_wt under R-2.10.0????
>>
>>
> There is no version 2.10.1 yet of R. Do you mean R-2.9.1? Then, yes, you
> can have problems with objects created with R-2.9 and saved from there if
> you try to load them in R-2.10. Strictly speaking, the version of R does not
> matter but if you use R-2.9, biocLite takes the packages from Bioconductor
> 2.4 and, in R-2.10, it takes them from Bioc 2.5, which is still under
> development and keeps changing, so that you have to expect that objects
> become incompatible.
>
> If you want to use the chipseq package you should do everything with
> R-2.10.
>
>  sessionInfo()
>>>
>> R version 2.9.1 (2009-06-26)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>>
>> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] rtracklayer_1.4.1 RCurl_0.98-1      bitops_1.0-4.1    ShortRead_1.2.1
>> [5] lattice_0.17-25   BSgenome_1.12.3   Biostrings_2.12.7 IRanges_1.2.3
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.4.1 grid_2.9.1    hwriter_1.1   tools_2.9.1   XML_2.5-3
>>
>
> Cheers
>  Simon
>
>
> +---
> | Dr. Simon Anders, Dipl.-Phys.
> | European Molecular Biology Laboratory (EMBL), Heidelberg
> | office phone +49-6221-387-8632
> | preferred (permanent) e-mail: [email protected]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to