On Fri, Apr 23, 2010 at 11:28 AM, Ivan Gregoretti <[email protected]>wrote:
> Hi Michael,
>
> With the GRanges object, resizing becomes a breeze. Thank you.
>
> For the purpose of leaving this operation documented, I will
> copy/paste my minimalist code:
>
>
> library(rtracklayer) # needed by import()
> library(BSgenome.Mmusculus.UCSC.mm9) # needed for chromosome lengths
>
> # load the features
> A <- import('hundredmilliontags.bed.gz', 'bed')
>
> # coerce to GRanges
> A <- as(A, 'GRanges')
>
> # Be elegant, supply chromosome lengths
> seqlengths(A) <- sapply(names(seqlengths(A)),
> function(x){length(Mmusculus[[x]])})
>
>
The BSgenome object has a seqlengths() accessor. So it's just:
seqlengths(A) <- seqlengths(BSgenome)[names(seqlengths(A))]
And btw, rtracklayer will change this devel cycle to at least optionally
output GRanges instead of RangedData.
# voila, proper resizing
> resize(A, width=200)
>
>
> Ivan
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
>
>
> On Fri, Apr 23, 2010 at 11:08 AM, Michael Lawrence
> <[email protected]> wrote:
> >
> >
> > On Fri, Apr 23, 2010 at 7:42 AM, Ivan Gregoretti <[email protected]>
> wrote:
> >>
> >> Hi Steve,
> >>
> >> What you showed worked. No question, but I found resize() to be
> >> unprepared to convenient use in RangedData objects.
> >>
> >> For example, consider a more biological set of data
> >>
> >> Z <- RangedData(
> >> RangesList(
> >> chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
> >> chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))),
> >> score = c( 2, 7, 3, 1, 1, 1 ),
> >> strand= c('+','+','-','+','-','-') )
> >>
> >> > Z
> >> RangedData with 6 rows and 2 value columns across 2 spaces
> >> space ranges | score strand
> >> <character> <IRanges> | <numeric> <character>
> >> 1 chrA [1, 3] | 2 +
> >> 2 chrA [4, 5] | 7 +
> >> 3 chrA [6, 9] | 3 -
> >> 4 chrB [1, 3] | 1 +
> >> 5 chrB [3, 5] | 1 -
> >> 6 chrB [6, 9] | 1 -
> >>
> >> here is resize() inconvenience
> >>
> >> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
> >> Error in function (classes, fdef, mtable) :
> >> unable to find an inherited method for function "resize", for
> >> signature "RangedData"
> >>
> >> What does work is ranges(Z) rather than Z itself:
> >> > resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end'))
> >> SimpleRangesList of length 2
> >> $chrA
> >> IRanges of length 3
> >> start end width
> >> [1] 1 200 200
> >> [2] 4 203 200
> >> [3] -190 9 200
> >>
> >> $chrB
> >> IRanges of length 3
> >> start end width
> >> [1] 1 200 200
> >> [2] 3 202 200
> >> [3] -190 9 200
> >>
> >> but as you see, the RangedData object is lost. You have to coerce it:
> >>
> >> > as(resize(ranges(Z), width=200,
> >> > fix=ifelse(Z$strand=='+','start','end')), 'RangedData')
> >> RangedData with 6 rows and 0 value columns across 2 spaces
> >> space ranges |
> >> <character> <IRanges> |
> >> 1 chrA [ 1, 200] |
> >> 2 chrA [ 4, 203] |
> >> 3 chrA [-190, 9] |
> >> 4 chrB [ 1, 200] |
> >> 5 chrB [ 3, 202] |
> >> 6 chrB [-190, 9] |
> >>
> >> Now I got a RangedData object but the value columns are still lost. I
> >> have to reconstruct it.
> >>
> >> [warning: the following command is obnoxious]
> >>
> >>
> >> > as(cbind(as.data.frame(as(resize(ranges(Z), width=200,
> >> > fix=ifelse(Z$strand=='+','start','end')), 'RangedData')),
> >> > as.data.frame(Z)[,5:dim(Z)[1]]), 'RangedData')
> >> RangedData with 6 rows and 2 value columns across 2 spaces
> >> space ranges | score strand
> >> <character> <IRanges> | <numeric> <factor>
> >> 1 chrA [ 1, 200] | 2 +
> >> 2 chrA [ 4, 203] | 7 +
> >> 3 chrA [-190, 9] | 3 -
> >> 4 chrB [ 1, 200] | 1 +
> >> 5 chrB [ 3, 202] | 1 -
> >> 6 chrB [-190, 9] | 1 -
> >>
> >> Granted. It works, but wouldn't it be more convenient this?:
> >>
> >> resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
> >>
> >> Z is a tiny toy example, biological sets are regularly multi-million
> >> rows. My set is over 100 million rows; as I write this, my 144GB RAM
> >> machine is doing the resizing the 'long way round', as obnoxiously
> >> shown . Still working.........
> >>
> >> I wonder if there is a 'cheaper' way resize a large RangedData
> >> instance. A better solution would be to upgrade resize() but I am not
> >> that R-skilled. I hope the developers will consider it.
> >>
> >
> > This would be a simple addition, but there is the bigger question of
> whether
> > RangedData should implement the Ranges API. It's really more of a
> "dataset
> > with ranges" than "ranges with data". RangedData *does* implement the
> > findOverlaps family of functions since they are used so commonly. There
> are
> > also "short cuts" to the starts, ends and widths.
> >
> > You might find GRanges more convenient for your use-case. resize,GRanges
> > automatically considers the strand in the expected way.
> >
> > Also, there is a short-cut like:
> >
> > resizedRanges <- resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+',
> > start','end'))
> > ranges(Z) <- resizedRanges
> >
> > Michael
> >
> >>
> >> Thank you,
> >>
> >> Ivan
> >>
> >> Ivan Gregoretti, PhD
> >> National Institute of Diabetes and Digestive and Kidney Diseases
> >> National Institutes of Health
> >>
> >>
> >>
> >> On Thu, Apr 22, 2010 at 5:11 PM, Steve Lianoglou
> >> <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > On Thu, Apr 22, 2010 at 4:17 PM, Ivan Gregoretti <[email protected]>
> >> > wrote:
> >> >> Hello everybody,
> >> >>
> >> >> How do you resize() the ranges of a RangedData object?
> >> >>
> >> >>
> >> >> In the past (IRanges 1.4.11), I could
> >> >>
> >> >> 1) extend forward 200 bases from the start in '+' ranges OR
> >> >> 2) extend backward 200 bases from the end in '-' ranges.
> >> >>
> >> >> The syntax was something like this:
> >> >>
> >> >> resize(ranges(A), width = 200, start = A$strand == "+")
> >> >>
> >> >> In IRanges 1.5.70, the "start" argument of resize() has been
> >> >> depracated and replaced by "fix".
> >> >>
> >> >> Can somebody show how to get the task accomplished with the new
> >> >> resize()?
> >> >
> >> > I'm pretty sure you use `fix` just like you use start:
> >> >
> >> > R> strands <- c("+", '-', '+', '-', '-')
> >> > R> ir <- IRanges(c(1,10,20,30, 40), width=5)
> >> > R> ir
> >> > IRanges of length 5
> >> > start end width
> >> > [1] 1 5 5
> >> > [2] 10 14 5
> >> > [3] 20 24 5
> >> > [4] 30 34 5
> >> > [5] 40 44 5
> >> >
> >> > R> resize(ir, width=8, fix=ifelse(strands == '+', 'start', 'end'))
> >> > IRanges of length 5
> >> > start end width
> >> > [1] 1 8 8
> >> > [2] 7 14 8
> >> > [3] 20 27 8
> >> > [4] 27 34 8
> >> > [5] 37 44 8
> >> >
> >> > --
> >> > Steve Lianoglou
> >> > Graduate Student: Computational Systems Biology
> >> > | Memorial Sloan-Kettering Cancer Center
> >> > | Weill Medical College of Cornell University
> >> > Contact Info:
> >> > http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
> >> >
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> [email protected]
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
> >
>
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing