On Fri, Sep 18, 2009 at 6:24 AM, Ivan Gregoretti <[email protected]> wrote:

> Hi Parick and everybody,
>
>
> > To everyone,
> > What other data reduction operations would you like to have on bed file
> > import?
> >
> >
> > Patrick
>
> BED functionality must-haves:
>
> well, a very common task is to load all chromosome BED records but
> segregating by strand. In ChIP-seq analysis for example, an
> accumulation of forward reads and the left and reverse reads on the
> right is a good indicator of true peak presence.
>
> So, we need to be given the choice of loading "+", "-", or
> unspecified. The BED specification
> http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED
> says that a record without field number 6 (strand) is perfectly valid.
>
>
This would be a useful filter. I hope it's clear though that these types of
manipulations are pretty easy to do after loading the data, as well.


> Now, regarding the WIG block counting, the user should be able to
> specify the shiftSize. What's shiftSize? Well, each read is only the
> end of a DNA fragment that is typically 120 to 200 bases. So, the
> inferred position of the fragment should the its start position plus
> 60 to 100 bases. If the fragment matches the reverse strand, then the
> inferred centre of the fragment should be it 'end' minus 60 to 100.
> That is the shiftSize.
>
> When no strand is specified, the centre of tag should be an acceptable
> choice.
>
> BED functionality to brag about:
>
> It would be extremely useful to be able to selectively load BED
> records contained in a set of genomic regions. (Something like the
> %in% functionality that Martin recently added to the ShortRead
> package.)
> So, lets imagine a tags-containing file and a big regions-containing
> file. Then we'd do
>
> myBigRegions <- import('myBigRegions.bed')
> insideRegions <- import('myTags.bed', in=myBigRegions, strand=c("+"))
> or also perhaps
> outsideRegions <- import('myTags.bed', not_in=myBigRegions, strand=c("+"))
>
>
All of this would be pretty easy to do within the proposed framework. Some
high-level functionality, like using estimated fragment length in the
coverage calculation, might belong in e.g. the chipseq package.

Also, while BED is a common format, it's not the only one; really one wants
block processing for every track format, WIG, GFF, etc, and this could even
be generalized beyond tracks to loading data of any type.

I know we've tossed around the idea of some sort of common I/O package. The
low-level callback mechanism could find a home there. There could be
incremental readers based directly on read.table and scan. Then rtracklayer
could provide a handler that translates the table into a RangedData, and
then delegates to the user and filter handlers.

Michael

Thank you,
>
> Ivan
>
>
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to