Thanks, Sean. That answers the question. Ivan
Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1592 Fax: 1-301-496-9878 On Thu, Sep 24, 2009 at 4:04 PM, Sean Davis <[email protected]> wrote: > On Thu, Sep 24, 2009 at 2:40 PM, Ivan Gregoretti <[email protected]> wrote: >> Hi Patrick, >> >> Great. It works. >> >> Can you clarify if the following observation is a feature or a bug? >> >> When I download >> >> http://dl.getdropbox.com/u/2051155/myTags.bed >> >> and from the unix prompt I take a peek at it, I get: >> >> head myTags.bed >> >> chr1 3002444 3002479 + >> chr1 3002989 3003024 - >> chr1 3017603 3017638 + >> chr1 3017879 3017914 - >> chr1 3018173 3018208 + >> chr1 3018183 3018218 - >> chr1 3018183 3018218 - >> chr1 3019065 3019100 + >> chr1 3019761 3019796 - >> chr1 3020044 3020079 - >> >> fine. It shows the 36 bases long reads. >> >> Now I follow your suggestion loading it into R: >> >> suppressMessages(library(rtracklayer)) >> >> myTags <- import('myTags.bed') >> >> ranges(myTags["chr1"])[[1]] >> IRanges instance: >> start end width >> [1] 3002445 3002479 35 >> [2] 3002990 3003024 35 >> [3] 3017604 3017638 35 >> [4] 3017880 3017914 35 >> [5] 3018174 3018208 35 >> [6] 3018184 3018218 35 >> [7] 3018184 3018218 35 >> [8] 3019066 3019100 35 >> [9] 3019762 3019796 35 >> ... ... ... ... >> [322808] 197166880 197166914 35 >> [322809] 197167672 197167706 35 >> [322810] 197167851 197167885 35 >> [322811] 197185820 197185854 35 >> [322812] 197185850 197185884 35 >> [322813] 197188518 197188552 35 >> [322814] 197189251 197189285 35 >> [322815] 197189593 197189627 35 >> [322816] 197191697 197191731 35 >> >> So, all start positions are shown as starting one nucleotide upstream >> from the original record and the features are reported as being 35 >> bases long instead of 36. >> >> Is it feature or bug? > > Hi, Ivan. I think bed format is zero-based, half-open coordinates. > > http://genome.ucsc.edu/FAQ/FAQformat#format1 > > Sean > > >> >> On Thu, Sep 24, 2009 at 2:51 AM, Patrick Aboyoun <[email protected]> wrote: >>> Ivan, >>> The RangedData class can store strand information in its values table. The >>> values table can store any "vector-like" object from simple R vectors >>> (including lists) to an instance of any of the *List classes defined in >>> IRanges. If you use rtracklayer's import function on a bed file containing >>> the information you have shown, the chromosome information will be used to >>> segment the other values into spaces, the start and end values will be >>> joined together in the ranges information (as a CompressedIRangesList >>> object) and the strand information will be stored as a factor column across >>> the values set (which is a CompressedDataFrameList object). The strand >>> information can be accessed by the strand accessor function. If your data >>> are sorted by strand within chromosome, you could add another level of >>> compression by storing the strand information as a 'factor' Rle in the >>> values table instead of a plain factor. rtracklayer's export function is >>> aware of a possible strand column in the values table and handles it >>> appropriately when serializing a RangedData object back into a bed file. >>> >>> >>> Patrick >>> >>> >>> Ivan Gregoretti wrote: >>>> >>>> Hi everybody, >>>> >>>> What is the minimal container class for position-and-orientation of >>>> Solexa reads? >>>> >>>> >>>> For example, the minimal positional information should be something >>>> like a BED record, like this >>>> >>>> chr1\t3000001\t3000036\t\t\t+\t >>>> ...(and many more lines)... >>>> >>>> sorry for the cumbersome string but I just want to stress that the >>>> minimal information is: >>>> >>>> column 1: chromosome >>>> column 2: start >>>> column 3: end >>>> column 6: orientation, either 'plus', 'minus' or undefined. (in this case >>>> a '+') >>>> >>>> Is there any compact container to load, say, 50 million records? I >>>> thought that RangedData could do that but after reading the >>>> documentation I see that it does not hold strand information. >>>> >>>> If there is such container, how do you load it up from a BED file? >>>> >>>> Thank you, >>>> >>>> Ivan >>>> >>>> Ivan Gregoretti, PhD >>>> National Institute of Diabetes and Digestive and Kidney Diseases >>>> National Institutes of Health >>>> 5 Memorial Dr, Building 5, Room 205. >>>> Bethesda, MD 20892. USA. >>>> Phone: 1-301-496-1592 >>>> Fax: 1-301-496-9878 >>>> >>>> _______________________________________________ >>>> Bioc-sig-sequencing mailing list >>>> [email protected] >>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>> >>> >>> >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
