On Fri, Apr 23, 2010 at 11:28 AM, Ivan Gregoretti <[email protected]>wrote:

> Hi Michael,
>
> With the GRanges object, resizing becomes a breeze. Thank you.
>
> For the purpose of leaving this operation documented, I will
> copy/paste my minimalist code:
>
>
> library(rtracklayer) # needed by import()
> library(BSgenome.Mmusculus.UCSC.mm9) # needed for chromosome lengths
>
> # load the features
> A <- import('hundredmilliontags.bed.gz', 'bed')
>
> # coerce to GRanges
> A <- as(A, 'GRanges')
>
> # Be elegant, supply chromosome lengths
> seqlengths(A) <- sapply(names(seqlengths(A)),
> function(x){length(Mmusculus[[x]])})
>
>
The BSgenome object has a seqlengths() accessor. So it's just:
seqlengths(A) <- seqlengths(BSgenome)[names(seqlengths(A))]

And btw, rtracklayer will change this devel cycle to at least optionally
output GRanges instead of RangedData.

# voila, proper resizing
> resize(A, width=200)
>
>
> Ivan
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
>
>
> On Fri, Apr 23, 2010 at 11:08 AM, Michael Lawrence
> <[email protected]> wrote:
> >
> >
> > On Fri, Apr 23, 2010 at 7:42 AM, Ivan Gregoretti <[email protected]>
> wrote:
> >>
> >> Hi Steve,
> >>
> >> What you showed worked. No question, but I found resize() to be
> >> unprepared to convenient use in RangedData objects.
> >>
> >> For example, consider a more biological set of data
> >>
> >> Z <- RangedData(
> >>       RangesList(
> >>          chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
> >>          chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))),
> >>       score = c( 2, 7, 3, 1, 1, 1 ),
> >>       strand= c('+','+','-','+','-','-') )
> >>
> >> > Z
> >> RangedData with 6 rows and 2 value columns across 2 spaces
> >>        space    ranges |     score      strand
> >>  <character> <IRanges> | <numeric> <character>
> >> 1        chrA    [1, 3] |         2           +
> >> 2        chrA    [4, 5] |         7           +
> >> 3        chrA    [6, 9] |         3           -
> >> 4        chrB    [1, 3] |         1           +
> >> 5        chrB    [3, 5] |         1           -
> >> 6        chrB    [6, 9] |         1           -
> >>
> >> here is resize() inconvenience
> >>
> >> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
> >> Error in function (classes, fdef, mtable)  :
> >>  unable to find an inherited method for function "resize", for
> >> signature "RangedData"
> >>
> >> What does work is ranges(Z) rather than Z itself:
> >> > resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end'))
> >> SimpleRangesList of length 2
> >> $chrA
> >> IRanges of length 3
> >>    start end width
> >> [1]     1 200   200
> >> [2]     4 203   200
> >> [3]  -190   9   200
> >>
> >> $chrB
> >> IRanges of length 3
> >>    start end width
> >> [1]     1 200   200
> >> [2]     3 202   200
> >> [3]  -190   9   200
> >>
> >> but as you see, the RangedData object is lost. You have to coerce it:
> >>
> >> > as(resize(ranges(Z), width=200,
> >> > fix=ifelse(Z$strand=='+','start','end')), 'RangedData')
> >> RangedData with 6 rows and 0 value columns across 2 spaces
> >>        space      ranges |
> >>  <character>   <IRanges> |
> >> 1        chrA [   1, 200] |
> >> 2        chrA [   4, 203] |
> >> 3        chrA [-190,   9] |
> >> 4        chrB [   1, 200] |
> >> 5        chrB [   3, 202] |
> >> 6        chrB [-190,   9] |
> >>
> >> Now I got a RangedData object but the value columns are still lost. I
> >> have to reconstruct it.
> >>
> >> [warning: the following command is obnoxious]
> >>
> >>
> >> > as(cbind(as.data.frame(as(resize(ranges(Z), width=200,
> >> > fix=ifelse(Z$strand=='+','start','end')), 'RangedData')),
> >> > as.data.frame(Z)[,5:dim(Z)[1]]), 'RangedData')
> >> RangedData with 6 rows and 2 value columns across 2 spaces
> >>        space      ranges |     score   strand
> >>  <character>   <IRanges> | <numeric> <factor>
> >> 1        chrA [   1, 200] |         2        +
> >> 2        chrA [   4, 203] |         7        +
> >> 3        chrA [-190,   9] |         3        -
> >> 4        chrB [   1, 200] |         1        +
> >> 5        chrB [   3, 202] |         1        -
> >> 6        chrB [-190,   9] |         1        -
> >>
> >> Granted. It works, but wouldn't it be more convenient this?:
> >>
> >> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
> >>
> >> Z is a tiny toy example, biological sets are regularly multi-million
> >> rows. My set is over 100 million rows; as I write this, my 144GB RAM
> >> machine is doing the resizing the 'long way round', as obnoxiously
> >> shown . Still working.........
> >>
> >> I wonder if there is a 'cheaper' way resize a large RangedData
> >> instance. A better solution would be to upgrade resize() but I am not
> >> that R-skilled. I hope the developers will consider it.
> >>
> >
> > This would be a simple addition, but there is the bigger question of
> whether
> > RangedData should implement the Ranges API. It's really more of a
> "dataset
> > with ranges" than "ranges with data". RangedData *does* implement the
> > findOverlaps family of functions since they are used so commonly. There
> are
> > also "short cuts" to the starts, ends and widths.
> >
> > You might find GRanges more convenient for your use-case. resize,GRanges
> > automatically considers the strand in the expected way.
> >
> > Also, there is a short-cut like:
> >
> > resizedRanges <- resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+',
> > start','end'))
> > ranges(Z) <- resizedRanges
> >
> > Michael
> >
> >>
> >> Thank you,
> >>
> >> Ivan
> >>
> >> Ivan Gregoretti, PhD
> >> National Institute of Diabetes and Digestive and Kidney Diseases
> >> National Institutes of Health
> >>
> >>
> >>
> >> On Thu, Apr 22, 2010 at 5:11 PM, Steve Lianoglou
> >> <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > On Thu, Apr 22, 2010 at 4:17 PM, Ivan Gregoretti <[email protected]>
> >> > wrote:
> >> >> Hello everybody,
> >> >>
> >> >> How do you resize() the ranges of a RangedData object?
> >> >>
> >> >>
> >> >> In the past (IRanges 1.4.11), I could
> >> >>
> >> >> 1) extend forward 200 bases from the start in '+' ranges OR
> >> >> 2) extend backward 200 bases from the end in '-' ranges.
> >> >>
> >> >> The syntax was something like this:
> >> >>
> >> >> resize(ranges(A), width = 200, start = A$strand == "+")
> >> >>
> >> >> In IRanges 1.5.70, the "start" argument of resize() has been
> >> >> depracated and replaced by "fix".
> >> >>
> >> >> Can somebody show how to get the task accomplished with the new
> >> >> resize()?
> >> >
> >> > I'm pretty sure you use `fix` just like you use start:
> >> >
> >> > R> strands <- c("+", '-', '+', '-', '-')
> >> > R> ir <- IRanges(c(1,10,20,30, 40), width=5)
> >> > R> ir
> >> > IRanges of length 5
> >> >    start end width
> >> > [1]     1   5     5
> >> > [2]    10  14     5
> >> > [3]    20  24     5
> >> > [4]    30  34     5
> >> > [5]    40  44     5
> >> >
> >> > R> resize(ir, width=8, fix=ifelse(strands == '+', 'start', 'end'))
> >> > IRanges of length 5
> >> >    start end width
> >> > [1]     1   8     8
> >> > [2]     7  14     8
> >> > [3]    20  27     8
> >> > [4]    27  34     8
> >> > [5]    37  44     8
> >> >
> >> > --
> >> > Steve Lianoglou
> >> > Graduate Student: Computational Systems Biology
> >> >  | Memorial Sloan-Kettering Cancer Center
> >> >  | Weill Medical College of Cornell University
> >> > Contact Info: 
> >> > http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
> >> >
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> [email protected]
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
> >
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to