On Fri, Apr 23, 2010 at 7:42 AM, Ivan Gregoretti <[email protected]> wrote:
> Hi Steve,
>
> What you showed worked. No question, but I found resize() to be
> unprepared to convenient use in RangedData objects.
>
> For example, consider a more biological set of data
>
> Z <- RangedData(
> RangesList(
> chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
> chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))),
> score = c( 2, 7, 3, 1, 1, 1 ),
> strand= c('+','+','-','+','-','-') )
>
> > Z
> RangedData with 6 rows and 2 value columns across 2 spaces
> space ranges | score strand
> <character> <IRanges> | <numeric> <character>
> 1 chrA [1, 3] | 2 +
> 2 chrA [4, 5] | 7 +
> 3 chrA [6, 9] | 3 -
> 4 chrB [1, 3] | 1 +
> 5 chrB [3, 5] | 1 -
> 6 chrB [6, 9] | 1 -
>
> here is resize() inconvenience
>
> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "resize", for
> signature "RangedData"
>
> What does work is ranges(Z) rather than Z itself:
> > resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end'))
> SimpleRangesList of length 2
> $chrA
> IRanges of length 3
> start end width
> [1] 1 200 200
> [2] 4 203 200
> [3] -190 9 200
>
> $chrB
> IRanges of length 3
> start end width
> [1] 1 200 200
> [2] 3 202 200
> [3] -190 9 200
>
> but as you see, the RangedData object is lost. You have to coerce it:
>
> > as(resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end')),
> 'RangedData')
> RangedData with 6 rows and 0 value columns across 2 spaces
> space ranges |
> <character> <IRanges> |
> 1 chrA [ 1, 200] |
> 2 chrA [ 4, 203] |
> 3 chrA [-190, 9] |
> 4 chrB [ 1, 200] |
> 5 chrB [ 3, 202] |
> 6 chrB [-190, 9] |
>
> Now I got a RangedData object but the value columns are still lost. I
> have to reconstruct it.
>
> [warning: the following command is obnoxious]
>
>
> > as(cbind(as.data.frame(as(resize(ranges(Z), width=200,
> fix=ifelse(Z$strand=='+','start','end')), 'RangedData')),
> as.data.frame(Z)[,5:dim(Z)[1]]), 'RangedData')
> RangedData with 6 rows and 2 value columns across 2 spaces
> space ranges | score strand
> <character> <IRanges> | <numeric> <factor>
> 1 chrA [ 1, 200] | 2 +
> 2 chrA [ 4, 203] | 7 +
> 3 chrA [-190, 9] | 3 -
> 4 chrB [ 1, 200] | 1 +
> 5 chrB [ 3, 202] | 1 -
> 6 chrB [-190, 9] | 1 -
>
> Granted. It works, but wouldn't it be more convenient this?:
>
> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
>
> Z is a tiny toy example, biological sets are regularly multi-million
> rows. My set is over 100 million rows; as I write this, my 144GB RAM
> machine is doing the resizing the 'long way round', as obnoxiously
> shown . Still working.........
>
> I wonder if there is a 'cheaper' way resize a large RangedData
> instance. A better solution would be to upgrade resize() but I am not
> that R-skilled. I hope the developers will consider it.
>
>
This would be a simple addition, but there is the bigger question of whether
RangedData should implement the Ranges API. It's really more of a "dataset
with ranges" than "ranges with data". RangedData *does* implement the
findOverlaps family of functions since they are used so commonly. There are
also "short cuts" to the starts, ends and widths.
You might find GRanges more convenient for your use-case. resize,GRanges
automatically considers the strand in the expected way.
Also, there is a short-cut like:
resizedRanges <- resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+',
start','end'))
ranges(Z) <- resizedRanges
Michael
> Thank you,
>
> Ivan
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
>
>
>
> On Thu, Apr 22, 2010 at 5:11 PM, Steve Lianoglou
> <[email protected]> wrote:
> > Hi,
> >
> > On Thu, Apr 22, 2010 at 4:17 PM, Ivan Gregoretti <[email protected]>
> wrote:
> >> Hello everybody,
> >>
> >> How do you resize() the ranges of a RangedData object?
> >>
> >>
> >> In the past (IRanges 1.4.11), I could
> >>
> >> 1) extend forward 200 bases from the start in '+' ranges OR
> >> 2) extend backward 200 bases from the end in '-' ranges.
> >>
> >> The syntax was something like this:
> >>
> >> resize(ranges(A), width = 200, start = A$strand == "+")
> >>
> >> In IRanges 1.5.70, the "start" argument of resize() has been
> >> depracated and replaced by "fix".
> >>
> >> Can somebody show how to get the task accomplished with the new
> resize()?
> >
> > I'm pretty sure you use `fix` just like you use start:
> >
> > R> strands <- c("+", '-', '+', '-', '-')
> > R> ir <- IRanges(c(1,10,20,30, 40), width=5)
> > R> ir
> > IRanges of length 5
> > start end width
> > [1] 1 5 5
> > [2] 10 14 5
> > [3] 20 24 5
> > [4] 30 34 5
> > [5] 40 44 5
> >
> > R> resize(ir, width=8, fix=ifelse(strands == '+', 'start', 'end'))
> > IRanges of length 5
> > start end width
> > [1] 1 8 8
> > [2] 7 14 8
> > [3] 20 27 8
> > [4] 27 34 8
> > [5] 37 44 8
> >
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology
> > | Memorial Sloan-Kettering Cancer Center
> > | Weill Medical College of Cornell University
> > Contact Info:
> > http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
> >
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing