Please ignore my last email. I forgot to load the GenomicRanges library. Ivan
On Tue, Aug 10, 2010 at 3:13 PM, Ivan Gregoretti <[email protected]> wrote: > Hi Patrick, > > I tried the new GRanges functionality of import() and it works great. > > That leads me to a related question. How do you remove, say, the field > "score"? > > In the past, when I loaded my BED features into a RangedData object > called A, removing the field "score" was as simple as: > > A$score <- NULL > > Now that A is a GRanges object, how do I do the same? > > Thank you, > > Ivan > > > > > On Thu, Aug 5, 2010 at 8:06 PM, Patrick Aboyoun <[email protected]> wrote: >> I just checked in a patch to rtracklayer version 1.9.6 into BioC 2.7 that >> added asRangedData arguments to all of the import methods except import.bw >> as well as to the GenomicData function. When asRangedData = FALSE, these >> import methods will produce GRanges objects. I didn't optimize any of the >> code yet, so there may be another rev tomorrow with some tweaks to the >> import code at the R level. Here is are some examples: >> >>> library(rtracklayer) >>> bedfilepath <- system.file("tests", "test.bed", package = "rtracklayer") >>> import(bedfilepath, asRangedData = FALSE) >> GRanges with 9 ranges and 5 elementMetadata values >> seqnames ranges strand | name score >> thickStart >> <Rle> <IRanges> <Rle> | <character> <numeric> >> <integer> >> [1] chr7 [127471197, 127472363] + | Pos1 0 >> 127471196 >> [2] chr7 [127472364, 127473530] + | Pos2 0 >> 127472363 >> [3] chr7 [127473531, 127474697] + | Pos3 0 >> 127473530 >> [4] chr7 [127474698, 127475864] + | Pos4 0 >> 127474697 >> [5] chr7 [127475865, 127477031] - | Neg1 0 >> 127475864 >> [6] chr7 [127477032, 127478198] - | Neg2 0 >> 127477031 >> [7] chr7 [127478199, 127479365] - | Neg3 0 >> 127478198 >> [8] chr7 [127479366, 127480532] + | Pos5 0 >> 127479365 >> [9] chr7 [127480533, 127481699] - | Neg4 0 >> 127480532 >> thickEnd itemRgb >> <integer> <character> >> [1] 127472363 #FF0000 >> [2] 127473530 #FF0000 >> [3] 127474697 #FF0000 >> [4] 127475864 #FF0000 >> [5] 127477031 #0000FF >> [6] 127478198 #0000FF >> [7] 127479365 #0000FF >> [8] 127480532 #FF0000 >> [9] 127481699 #0000FF >> >> seqlengths >> chr7 >> NA >>> bed15filepath <- system.file("tests", "test.bed15", package = >>> "rtracklayer") >>> import(bed15filepath, asRangedData = FALSE) >> GRanges with 2 ranges and 13 elementMetadata values >> seqnames ranges strand | name score >> thickStart >> <Rle> <IRanges> <Rle> | <character> <numeric> >> <integer> >> [1] chr1 [159639973, 159640031] - | 2440848 500 >> 159639972 >> [2] chr1 [159640162, 159640190] - | 2440849 500 >> 159640161 >> thickEnd itemRgb blockCount blockSizes blockStarts breast_A >> <integer> <character> <integer> <character> <character> <numeric> >> [1] 159640031 NA 1 59, 0, 0.593 >> [2] 159640190 NA 1 29, 0, -0.906 >> breast_B breast_C cerebellum_A cerebellum_B >> <numeric> <numeric> <numeric> <numeric> >> [1] 1.196 -0.190 -1.088 0.093 >> [2] -1.247 0.111 -0.515 -0.057 >> >> seqlengths >> chr1 >> NA >>> sessionInfo() >> R version 2.12.0 Under development (unstable) (2010-08-01 r52659) >> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] rtracklayer_1.9.6 RCurl_1.4-3 bitops_1.0-4.1 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.9.0 Biostrings_2.17.27 BSgenome_1.17.6 >> [4] GenomicRanges_1.1.20 IRanges_1.7.15 tools_2.12.0 >> [7] XML_3.1-0 >> >> >> Patrick >> >> >> On 8/5/10 1:11 PM, Michael Lawrence wrote: >> >> On Thu, Aug 5, 2010 at 10:45 AM, Patrick Aboyoun <[email protected]> wrote: >>> >>> Michael, >>> I just made a minor check-in to rtracklayer where I replaced use of >>> Biobase:listLen with IRanges::elementLenghts in an effort to minimize the >>> impact of Biobase on the sequence package stack. >>> >> >> Ok. It looks like elementLengths has been optimized since the last time I >> looked. >> >> >>> >>> Before I start the boulder rolling, how should I reconcile the UCSCData >>> class with the GRanges class? Once I have that sorted I can make changes to >>> import.bed and import.wig as well. >>> >> >> Well, eventually we'll want to stick the track line information on to >> GRanges. Could be done via a subclass like with UCSCData. metadata() is >> another option. I do actually use the subclass for dispatch purposes, pretty >> printing, etc. For right now though, the extra information could just be >> dropped if the user requests a GRanges. >> >>> >>> I originally named the argument asRangedData in the BSgenome methods to >>> reinforce that RangedData output is not intended to be the default and >>> conceptually the user is making an extra effort to produce a RangedData >>> object. >>> >>> >>> Patrick >>> >>> >>> On 8/5/10 4:32 AM, Michael Lawrence wrote: >>> >>> Makes sense. But why not make it asGRanges, which is shorter? Please go >>> ahead and check in your work so far. >>> >>> Thanks a lot, >>> Michael >>> >>> On Thu, Aug 5, 2010 at 12:51 AM, Patrick Aboyoun <[email protected]> >>> wrote: >>>> >>>> Michael, >>>> Breaking this down to two issues: >>>> >>>> Filtering >>>> Martin has been working on improving filtering in the ShortRead package >>>> to move from a read all then filter data to a block processing based >>>> filtering methodology. Lessons learned there can be brought to rtracklayer >>>> for large bed files and the like. >>>> >>>> import() output class >>>> Keeping the same API and just switching the import methods from producing >>>> RangedData (or UCSCData) output to GRanges output will break backward >>>> compatibility because the RangedData API is not wholly applicable to >>>> GRanges >>>> objects. I would not recommend this course since a number of packages in >>>> BioC and scripts in the wild expect the import methods to produce a >>>> RangedData (or UCSCData) object. An additional argument is not that onerous >>>> and can be fazed out over the course of two or three releases (1 - 1.5 >>>> years). Another alternative is to add a new import function (read.GRanges?) >>>> to rtracklayer that shares the same infrastructure as the existing import >>>> methods. >>>> >>>> I have a local copy of rtracklayer where I added a new asRangedData flag >>>> to the GenomicData function and import.gff* methods. I'll sit on this for >>>> now since these changes didn't take a lot of work. This is one of the >>>> situations where the managing the life cycle of the function specs is >>>> trickier than making the desired code changes. >>>> >>>> >>>> Cheers, >>>> Patrick >>>> >>>> >>>> On 8/4/10 8:24 PM, Michael Lawrence wrote: >>>> >>>> This might work, but it seems like an expensive optimization in that it >>>> changes a lot of the API. If someone cannot make a single copy of the data, >>>> it's unlikely that they're even going to be able to get to GenomicData() or >>>> manipulate it later. Perhaps the coercion function needs some simple >>>> tweaks? >>>> The filter support would definitely help. I'd rather keep things simple and >>>> return a single type, and GRanges sounds most appropriate. >>>> >>>> But I'm open to suggestions and further argument. >>>> >>>> Michael >>>> >>>> On Wed, Aug 4, 2010 at 2:05 PM, Patrick Aboyoun <[email protected]> >>>> wrote: >>>>> >>>>> Michael, >>>>> How integrated would you like to see the GRanges class in rtracklayer? >>>>> The rtracklayer::GenomicData constructor is the master instantiator. I >>>>> would >>>>> like to add an asRangedData = TRUE (default) argument to the GenomicData >>>>> function and push it all the way up through the import functions where >>>>> when >>>>> the user sets asRangedData = FALSE, the GenomicData function would create >>>>> a >>>>> GRanges object. This is what we did with the >>>>> {matchPWM,vmatchPattern,vmatchPDict},BSgenome-methods in the BSgenome >>>>> package and it as good a solution as any. This is a straight-forward >>>>> change >>>>> and wouldn't take too long to complete. >>>>> >>>>> >>>>> Patrick >>>>> >>>>> >>>>> On 8/4/10 12:56 PM, Michael Lawrence wrote: >>>>>> >>>>>> GRanges support is definitely on the TODO list. Filters are a good idea >>>>>> and >>>>>> also on the TODO list, possibly with a chunk size parameter to enable >>>>>> chunk >>>>>> processing. >>>>>> >>>>>> I'd love to have the GRanges stuff at least done by the next release. >>>>>> Patches welcome, of course :) >>>>>> >>>>>> Michael >>>>>> >>>>>> On Wed, Aug 4, 2010 at 8:08 AM, Ivan Gregoretti<[email protected]> >>>>>> wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> Hello Michael and everyone, >>>>>>> >>>>>>> Would you please consider adding to import() the capacity to generate >>>>>>> a GRanges object rather than the default RangedData object? >>>>>>> >>>>>>> Also, >>>>>>> >>>>>>> Wouldn't it be great to be able to import() with filters just like >>>>>>> with readAligned()? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Justification >>>>>>> >>>>>>> GRanges is a biology-aware container. When importing large BEDs into >>>>>>> R, the current workflow involves creating RangedData first and then >>>>>>> converting to GRanges. >>>>>>> >>>>>>> If the BEDs are really big, holding both objects in memory at any >>>>>>> point in time is a hardware challenge. >>>>>>> >>>>>>> The capacity to filter the input would help in this case and in >>>>>>> general it would provide an increase in efficiency. >>>>>>> >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Ivan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Ivan Gregoretti, PhD >>>>>>> National Institute of Diabetes and Digestive and Kidney Diseases >>>>>>> National Institutes of Health >>>>>>> 5 Memorial Dr, Building 5, Room 205. >>>>>>> Bethesda, MD 20892. USA. >>>>>>> Phone: 1-301-496-1016 and 1-301-496-1592 >>>>>>> Fax: 1-301-496-9878 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioc-sig-sequencing mailing list >>>>>>> [email protected] >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>>>>> >>>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-sig-sequencing mailing list >>>>>> [email protected] >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>>>> >>>>> >>>> >>>> >>> >>> >> >> >> > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
