Err, 2nd time that I forgot to send it from my bioc-sig-sequencing registered email. Sorry about that :(
Leonardo On Mon, Sep 20, 2010 at 12:21 PM, Leonardo Collado Torres < [email protected]> wrote: > Hello Michal and Ivan, > > Thanks for the replies and your input :) > > In my use case all elements are from the "*" strand. In mixed cases as in > Ivan's example, the two workarounds I posted earlier do not work. I say so, > since I'm expecting just what Michael said: to treat "*" elements as from > the "+" strand. > > I tried using the method Michael coded, but I do not know how to call a > specific method. This must be pretty basic for you, but what is the way to > do so? > > Greetings, > Leonardo > > > library(GenomicRanges) > > ## Ivan's example GRanges > > > C <- GRanges(seqnames=c("chr1","chr2","chr19","chrX"), > + ranges=IRanges(start=c(0,0,5,1), > + end=c(150,150,150,400)), > + strand=c("*","-","*","+"), > + score=c(10,20,30,90)) > > C > GRanges with 4 ranges and 1 elementMetadata value > seqnames ranges strand | score > <Rle> <IRanges> <Rle> | <numeric> > [1] chr1 [0, 150] * | 10 > [2] chr2 [0, 150] - | 20 > [3] chr19 [5, 150] * | 30 > [4] chrX [1, 400] + | 90 > > seqlengths > chr1 chr19 chr2 chrX > NA NA NA NA > > ## The flank + shift workaround does not work, as the element on the "-" > strand is moved to position 152 instead of 150. > > flank(shift(C, 1), 1) > > GRanges with 4 ranges and 1 elementMetadata value > seqnames ranges strand | score > <Rle> <IRanges> <Rle> | <numeric> > [1] chr1 [ 0, 0] * | 10 > [2] chr2 [152, 152] - | 20 > > [3] chr19 [ 5, 5] * | 30 > [4] chrX [ 1, 1] + | 90 > > seqlengths > chr1 chr19 chr2 chrX > NA NA NA NA > > ## The workaround where you redifine the GRanges produces the same result > as Ivan's workaround. However, I would expect the "-" strand element "start" > to be position 150 and not position 0. > > GRanges( seqnames = seqnames(C), ranges = IRanges( start = start(C), > width=1), strand = strand(C)) > > GRanges with 4 ranges and 0 elementMetadata values > seqnames ranges strand | > <Rle> <IRanges> <Rle> | > [1] chr1 [0, 0] * | > [2] chr2 [0, 0] - | > [3] chr19 [5, 5] * | > [4] chrX [1, 1] + | > > > seqlengths > chr1 chr19 chr2 chrX > NA NA NA NA > > ## Current output of the resize function: > > resize(C, fix="start", width=1) > > GRanges with 4 ranges and 1 elementMetadata value > seqnames ranges strand | score > <Rle> <IRanges> <Rle> | <numeric> > [1] chr1 [ 75, 75] * | 10 > [2] chr2 [150, 150] - | 20 > [3] chr19 [ 77, 77] * | 30 > > [4] chrX [ 1, 1] + | 90 > > seqlengths > chr1 chr19 chr2 chrX > NA NA NA NA > > ## Methods available for "resize" > > showMethods("resize") > Function: resize (package IRanges) > x="CompressedIRangesList" > x="GRanges" > x="IRanges" > (inherited from: x="Ranges") > x="NormalIRanges" > x="Ranges" > x="RangesList" > > ## Failed attempt to use Michael's method > > > setMethod("resize", "GenomicRanges", > + function(x, width, fix = "start", use.names = TRUE) > + { > + revFix <- c(start = "end", end = "start", center = "center") > + fix <- ifelse(strand(x) == "-", revFix[fix], fix) > + ranges <- > + resize(ranges(x), width = width, fix = fix, use.names = > use.names) > + if (!IRanges:::anyMissing(seqlengths(x))) { > + start(x) <- start(ranges) > + end(x) <- end(ranges) > + } else { > + x <- clone(x, ranges = ranges) > + } > + x > + } > + ) > [1] "resize" > > ## Methods available for resize part II: > > showMethods("resize") > Function: resize (package IRanges) > x="CompressedIRangesList" > x="GenomicRanges" > x="GRanges" > x="IRanges" > (inherited from: x="Ranges") > x="NormalIRanges" > x="Ranges" > x="RangesList" > > ## Will it use the GRanges method or the GenomicRanges method for "C"? > > class(C) > [1] "GRanges" > attr(,"package") > [1] "GenomicRanges" > > ## It produces the same result from above. How can I use the new > "GenomicRanges" method for "resize" with the example GRanges "C"? My guess > is that it uses the "GRanges" method instead of the "GenomicRanges" one. > > resize(C, fix="start", width=1) > > GRanges with 4 ranges and 1 elementMetadata value > seqnames ranges strand | score > <Rle> <IRanges> <Rle> | <numeric> > [1] chr1 [ 75, 75] * | 10 > [2] chr2 [150, 150] - | 20 > [3] chr19 [ 77, 77] * | 30 > > [4] chrX [ 1, 1] + | 90 > > seqlengths > chr1 chr19 chr2 chrX > NA NA NA NA > > > ### Manually edited text: > ## This is the output I would expect from "resize" > > resize(C, fix="start", width=1) > > GRanges with 4 ranges and 1 elementMetadata value > seqnames ranges strand | score > <Rle> <IRanges> <Rle> | <numeric> > [1] chr1 [ 0, 0] * | 10 > [2] chr2 [150, 150] - | 20 > [3] chr19 [ 5, 5] * | 30 > > [4] chrX [ 1, 1] + | 90 > ### End of manually edited text > > > seqlengths > chr1 chr19 chr2 chrX > NA NA NA NA > > ## I updated GenomicRanges and IRanges using biocLite prior to running the > above pieces of code. > > > sessionInfo() > R version 2.12.0 Under development (unstable) (2010-09-08 r52880) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=en_US.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicRanges_1.1.25 IRanges_1.7.34 > > > > On Tue, Sep 14, 2010 at 4:13 PM, Ivan Gregoretti <[email protected]>wrote: > >> Hi Michael, since you are asking for opinions... >> >> >> When specifying the 'fix=' argument: >> >> In my view, fix="start" should always resize at the "start" regardless >> of strandedness. >> >> >> >> Default behaviour (when not specifying the 'fix=' argument): >> >> When the strand is "*", I would expect resize() to default to >> fix="center" rather than "start". >> >> When strands are "+" and "-" I would expect resize to default to >> fix="start" and fix="end' respectively. >> >> >> >> Thank you, >> >> Ivan >> >> >> >> On Tue, Sep 14, 2010 at 4:46 PM, Michael Lawrence >> <[email protected]> wrote: >> > I just checked in some changes to IRanges, that make this method work: >> > >> > setMethod("resize", "GenomicRanges", >> > function(x, width, fix = "start", use.names = TRUE) >> > { >> > revFix <- c(start = "end", end = "start", center = "center") >> > fix <- ifelse(strand(x) == "-", revFix[fix], fix) >> > ranges <- >> > resize(ranges(x), width = width, fix = fix, use.names = >> > use.names) >> > if (!IRanges:::anyMissing(seqlengths(x))) { >> > start(x) <- start(ranges) >> > end(x) <- end(ranges) >> > } else { >> > x <- clone(x, ranges = ranges) >> > } >> > x >> > } >> > ) >> > >> > That will accept the fix argument, except start and end are reversed for >> > negative strand features. '*' is treated just like '+'. If this is >> > acceptable to the GenomicRanges guys, I will commit this. >> > >> > On Tue, Sep 14, 2010 at 9:36 AM, Ivan Gregoretti <[email protected]> >> wrote: >> >> >> >> Hello Leonardo, >> >> >> >> I believe that the issue here is that resize() does not support the >> >> "fix" argument at all when handling GRanges. >> >> >> >> Actually that would be a nice upgrade of functionality for GRanges. >> >> >> >> I face the same limitation and I currently resize by hand. :( >> >> >> >> This is my work around: >> >> >> >> library(GenomicRanges) >> >> >> >> # a set of genomic features called C >> >> C <- GRanges(seqnames=c("chr1","chr2","chr19","chrX"), >> >> ranges=IRanges(start=c(0,0,5,1), >> >> end=c(150,150,150,400)), >> >> strand=c("*","-","*","+"), >> >> score=c(10,20,30,90)) >> >> >> >> >> >> # peek at C >> >> C >> >> GRanges with 4 ranges and 1 elementMetadata value >> >> seqnames ranges strand | score >> >> <Rle> <IRanges> <Rle> | <numeric> >> >> [1] chr1 [0, 150] * | 10 >> >> [2] chr2 [0, 150] - | 20 >> >> [3] chr19 [5, 150] * | 30 >> >> [4] chrX [1, 400] + | 90 >> >> >> >> seqlengths >> >> chr1 chr19 chr2 chrX >> >> NA NA NA NA >> >> >> >> # this is the workaround >> >> ranges(C) <- resize(ranges(C),1,fix="start") >> >> >> >> # peek at the resized set C >> >> C >> >> GRanges with 4 ranges and 1 elementMetadata value >> >> seqnames ranges strand | score >> >> <Rle> <IRanges> <Rle> | <numeric> >> >> [1] chr1 [0, 0] * | 10 >> >> [2] chr2 [0, 0] - | 20 >> >> [3] chr19 [5, 5] * | 30 >> >> [4] chrX [1, 1] + | 90 >> >> >> >> seqlengths >> >> chr1 chr19 chr2 chrX >> >> NA NA NA NA >> >> >> >> >> >> Cheers, >> >> >> >> Ivan >> >> >> >> >> >> Ivan Gregoretti, PhD >> >> National Institute of Diabetes and Digestive and Kidney Diseases >> >> National Institutes of Health >> >> 5 Memorial Dr, Building 5, Room 205. >> >> Bethesda, MD 20892. USA. >> >> Phone: 1-301-496-1016 and 1-301-496-1592 >> >> Fax: 1-301-496-9878 >> >> >> >> >> >> >> >> On Tue, Sep 14, 2010 at 11:36 AM, Leonardo Collado Torres >> >> <[email protected]> wrote: >> >> > Hello, >> >> > >> >> > I have a rather simple question that involves GenomicRanges' design. >> >> > >> >> > Basically, I have a GRanges object where all the elements are from >> the >> >> > undefined "*" strand. I just want to resize them to get the 1st (from >> >> > left >> >> > to right) base. However, I'm not able to do so with the "resize" >> >> > function >> >> > even when specifying fix = "start" as it uses the fix = "center" >> method. >> >> > Is >> >> > this the desired performance? I have 2 workarounds, but I'm puzzled >> as >> >> > the >> >> > "flank" function actually uses the start (left to right) when >> elements >> >> > are >> >> > from the "*" strand. Is there a quicker way to do this or should I >> stick >> >> > to >> >> > the flank + shift workaround? >> >> > >> >> > Thank you and greetings, >> >> > Leonardo >> >> > >> >> >> testGR <- GRanges( seqnames = rep("test", 3), ranges = IRanges ( >> start >> >> >> = >> >> > c(10,100,1000), width = c(10, 100, 1000)), strand = >> >> > Rle(strand(c("+","-")), >> >> > c(1,2)) ) >> >> >> testGR >> >> > GRanges with 3 ranges and 0 elementMetadata values >> >> > seqnames ranges strand | >> >> > <Rle> <IRanges> <Rle> | >> >> > [1] test [ 10, 19] + | >> >> > [2] test [ 100, 199] - | >> >> > [3] test [1000, 1999] - | >> >> > >> >> > seqlengths >> >> > test >> >> > NA >> >> >> resize(testGR, 1, fix="start") >> >> > GRanges with 3 ranges and 0 elementMetadata values >> >> > seqnames ranges strand | >> >> > <Rle> <IRanges> <Rle> | >> >> > [1] test [ 10, 10] + | >> >> > [2] test [ 199, 199] - | >> >> > [3] test [1999, 1999] - | >> >> > >> >> > seqlengths >> >> > test >> >> > NA >> >> >> testGR2 <- GRanges( seqnames = rep("test", 3), ranges = IRanges ( >> start >> >> >> = >> >> > c(10,100,1000), width = c(10, 100, 1000)), strand = >> Rle(strand(c("*")), >> >> > c(3)) ) >> >> >> testGR2 >> >> > GRanges with 3 ranges and 0 elementMetadata values >> >> > seqnames ranges strand | >> >> > <Rle> <IRanges> <Rle> | >> >> > [1] test [ 10, 19] * | >> >> > [2] test [ 100, 199] * | >> >> > [3] test [1000, 1999] * | >> >> > >> >> > seqlengths >> >> > test >> >> > NA >> >> >> resize(testGR2, 1, fix="start") >> >> > GRanges with 3 ranges and 0 elementMetadata values >> >> > seqnames ranges strand | >> >> > <Rle> <IRanges> <Rle> | >> >> > [1] test [ 14, 14] * | >> >> > [2] test [ 149, 149] * | >> >> > [3] test [1499, 1499] * | >> >> > >> >> > seqlengths >> >> > test >> >> > NA >> >> > >> >> >> testGR3 <- GRanges ( seqnames = seqnames(testGR2), ranges = IRanges( >> >> >> start >> >> > = start(testGR2), width = 1), strand = strand(testGR2) ) >> >> >> >> >> >> testGR3 >> >> > GRanges with 3 ranges and 0 elementMetadata values >> >> > seqnames ranges strand | >> >> > <Rle> <IRanges> <Rle> | >> >> > [1] test [ 10, 10] * | >> >> > [2] test [ 100, 100] * | >> >> > [3] test [1000, 1000] * | >> >> > >> >> > seqlengths >> >> > test >> >> > NA >> >> >> >> >> > >> >> >> testGR4 <- shift(flank( testGR2, 1), 1) >> >> >> testGR4 >> >> > GRanges with 3 ranges and 0 elementMetadata values >> >> > seqnames ranges strand | >> >> > <Rle> <IRanges> <Rle> | >> >> > [1] test [ 10, 10] * | >> >> > [2] test [ 100, 100] * | >> >> > [3] test [1000, 1000] * | >> >> > >> >> > seqlengths >> >> > test >> >> > NA >> >> > >> >> >> sessionInfo() >> >> > R version 2.12.0 Under development (unstable) (2010-09-08 r52880) >> >> > Platform: x86_64-unknown-linux-gnu (64-bit) >> >> > >> >> > locale: >> >> > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> >> > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> >> > [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 >> >> > [7] LC_PAPER=en_US.utf8 LC_NAME=C >> >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> > >> >> > attached base packages: >> >> > [1] stats graphics grDevices utils datasets methods base >> >> > >> >> > other attached packages: >> >> > [1] GenomicRanges_1.1.25 IRanges_1.7.33 >> >> > >> >> > loaded via a namespace (and not attached): >> >> > [1] tools_2.12.0 >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > _______________________________________________ >> >> > Bioc-sig-sequencing mailing list >> >> > [email protected] >> >> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> >> > >> >> >> >> _______________________________________________ >> >> Bioc-sig-sequencing mailing list >> >> [email protected] >> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > >> > >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
