On Thu, Sep 24, 2009 at 2:40 PM, Ivan Gregoretti <[email protected]> wrote: > Hi Patrick, > > Great. It works. > > Can you clarify if the following observation is a feature or a bug? > > When I download > > http://dl.getdropbox.com/u/2051155/myTags.bed > > and from the unix prompt I take a peek at it, I get: > > head myTags.bed > > chr1 3002444 3002479 + > chr1 3002989 3003024 - > chr1 3017603 3017638 + > chr1 3017879 3017914 - > chr1 3018173 3018208 + > chr1 3018183 3018218 - > chr1 3018183 3018218 - > chr1 3019065 3019100 + > chr1 3019761 3019796 - > chr1 3020044 3020079 - > > fine. It shows the 36 bases long reads. > > Now I follow your suggestion loading it into R: > > suppressMessages(library(rtracklayer)) > > myTags <- import('myTags.bed') > > ranges(myTags["chr1"])[[1]] > IRanges instance: > start end width > [1] 3002445 3002479 35 > [2] 3002990 3003024 35 > [3] 3017604 3017638 35 > [4] 3017880 3017914 35 > [5] 3018174 3018208 35 > [6] 3018184 3018218 35 > [7] 3018184 3018218 35 > [8] 3019066 3019100 35 > [9] 3019762 3019796 35 > ... ... ... ... > [322808] 197166880 197166914 35 > [322809] 197167672 197167706 35 > [322810] 197167851 197167885 35 > [322811] 197185820 197185854 35 > [322812] 197185850 197185884 35 > [322813] 197188518 197188552 35 > [322814] 197189251 197189285 35 > [322815] 197189593 197189627 35 > [322816] 197191697 197191731 35 > > So, all start positions are shown as starting one nucleotide upstream > from the original record and the features are reported as being 35 > bases long instead of 36. > > Is it feature or bug?
Hi, Ivan. I think bed format is zero-based, half-open coordinates. http://genome.ucsc.edu/FAQ/FAQformat#format1 Sean > > On Thu, Sep 24, 2009 at 2:51 AM, Patrick Aboyoun <[email protected]> wrote: >> Ivan, >> The RangedData class can store strand information in its values table. The >> values table can store any "vector-like" object from simple R vectors >> (including lists) to an instance of any of the *List classes defined in >> IRanges. If you use rtracklayer's import function on a bed file containing >> the information you have shown, the chromosome information will be used to >> segment the other values into spaces, the start and end values will be >> joined together in the ranges information (as a CompressedIRangesList >> object) and the strand information will be stored as a factor column across >> the values set (which is a CompressedDataFrameList object). The strand >> information can be accessed by the strand accessor function. If your data >> are sorted by strand within chromosome, you could add another level of >> compression by storing the strand information as a 'factor' Rle in the >> values table instead of a plain factor. rtracklayer's export function is >> aware of a possible strand column in the values table and handles it >> appropriately when serializing a RangedData object back into a bed file. >> >> >> Patrick >> >> >> Ivan Gregoretti wrote: >>> >>> Hi everybody, >>> >>> What is the minimal container class for position-and-orientation of >>> Solexa reads? >>> >>> >>> For example, the minimal positional information should be something >>> like a BED record, like this >>> >>> chr1\t3000001\t3000036\t\t\t+\t >>> ...(and many more lines)... >>> >>> sorry for the cumbersome string but I just want to stress that the >>> minimal information is: >>> >>> column 1: chromosome >>> column 2: start >>> column 3: end >>> column 6: orientation, either 'plus', 'minus' or undefined. (in this case >>> a '+') >>> >>> Is there any compact container to load, say, 50 million records? I >>> thought that RangedData could do that but after reading the >>> documentation I see that it does not hold strand information. >>> >>> If there is such container, how do you load it up from a BED file? >>> >>> Thank you, >>> >>> Ivan >>> >>> Ivan Gregoretti, PhD >>> National Institute of Diabetes and Digestive and Kidney Diseases >>> National Institutes of Health >>> 5 Memorial Dr, Building 5, Room 205. >>> Bethesda, MD 20892. USA. >>> Phone: 1-301-496-1592 >>> Fax: 1-301-496-9878 >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> [email protected] >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >> >> > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
