On Wed, Mar 2, 2011 at 9:44 AM, Vincent Carey <[email protected]>wrote:
> On Wed, Mar 2, 2011 at 9:58 AM, Martin Morgan <[email protected]> wrote: > > On 03/01/2011 04:44 AM, Michael Lawrence wrote: > >> Hi guys, > >> > >> What are the plans for the BamViews class. It looks like a useful > >> foundation. One thing that would be good to have in R is a way to > calculate > >> "pileups" or base tallies for positions of interest. These counts could > be > >> broken down by sample (bamfile), cycle (position in the read), etc. > Results > >> returned as a DataFrame (in a format like that returned by as.data.frame > on > >> a table) that could be aggregated() up as desired. Rles would save > memory. > >> So there could be something like a alphabetFrequency() method for > BamViews. > >> This is related to Steve's recent work with counting over XStringSets. > > > > Hi Michael -- BamViews is definitely open for more development. The > > methods currently implemented (minimal!) basically dispatch to > > single-bam variants. And I guess there is no single-bam variant of what > > you're looking for. > > > > Another possibility is to expose more of samtools, e.g., pileup / > > mpileup, which might be returned more or less directly for manipulation > > in R, or summarized. I'll work on this in the 3 week time frame (sorry) > > exposition of pileup/mpileup was what occurred to me also. i would > hope it is not > premature to express some concern with the downstream container for > the outputs of > these things. we have a pileup-output parser which delivers a GRanges and > that > is probably adequate, although decoding the pileup string might be a > useful added value. > > mpileup delivers VCF/BCF and while we can scan these, > some of the structures returned can only be interpreted by checking > some file specification > and it would be good to have some downstream data modeling based on > use cases, that the > mpileup interface could target. Yes, it would be great to come up with some of these. Most of the time, we're just looking to summarize the base counts at each position in some way. It looks like samtools pileup is deprecated, so we should be focusing on mpileup. I had looked at mpileup, and better support for that makes sense, but the amount of parsing/summarizing is almost such that it would be easier to just write a tool from scratch that works directly off of the BAM files. Actually, Tom Wu here has already done that with tools in his gsnap suite. We've already begun work on a package to integrate that suite with R (which is how I've solved my current problem). Although gsnap is public, it's not very well known outside of Genentech. That's unfortunate, because it's a great aligner, especially for RNA-seq data. Anyway, thanks for looking into it Martin. Michael such developments could be important > for the ISMB tutorial > so i will be thinking more about this in coming weeks. > > > > > Maybe Herve will weigh in on Steve's XStringSet sliding window > > letterFrequencyAt > > > > Martin > > > >> > >> Surely there are many other features that could be added. The above is > just > >> one that I would use often, across a number of contexts. > >> > >> Thanks, > >> Michael > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioc-sig-sequencing mailing list > >> [email protected] > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > > > -- > > Computational Biology > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > > > Location: M1-B861 > > Telephone: 206 667-2793 > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > [email protected] > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
