Re: [Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

joseph franklin Mon, 19 Apr 2010 08:51:04 -0700

Great--I'll switch to the development version and give that a try.

Thanks-
Joe



On 19 Apr 2010, at 11:26, Kasper Daniel Hansen wrote:

> Hi Joe
> 
> This is addressed in the development version.  We now have the
> capability of giving importFromAlignedReads a (named) vector of
> filenames instead of a named list of AlignedRead objects.  This vector
> of filenames will be read in one at a time, so you just need enough
> memory to process a single lane.  I have processed around 160 lanes
> worth of data using this approach.
> 
> There is an extended example in the 'with ShortRead' vignette.
> 
> importFromAlignedReads also has the capability of directly summing
> several columns (fi you need this).  So let us say you have 6 files
> (lanes) and you want to end up with a database with 2 columns
> (assuming you have a 3x2 experiment and you have decided to add up
> over the lanes).  Then you can do this using a construction where the
> names of the files are like
>  "a", "a", "a", "b", "b", "b"
> (this will create two columns named "a" and "b" each holding 3 lanes
> worth of data).
> 
> In this case, all 3 lanes will be read into memory at the same time -
> it is less memory efficient but it was much easier to code.  If that
> is impossible you should create a standard 6 column database and then
> use collapseExpData.  The importFromAlignedReads is more of a
> convenience (and speed) trick.
> 
> I uploaded a new version 1.1.6 yesterday which I recommend, because of
> some documentation updates.  This version should replace 1.1.5 on the
> Bioconductor development servers sometime tomorrow.
> 
> Kasper
> 
> 
> On Mon, Apr 19, 2010 at 11:06 AM, joseph franklin
> <[email protected]> wrote:
>> I'm addressing this to Jim Bullard, who has been really helpful answering 
>> some of my questions, as well as the list, in case anyone has some advice 
>> for me.
>> 
>> I've started using Genominator (I'm using the release version right now) to 
>> quantitate and analyze RNA-seq data, and have been really successful 
>> aggregating AlignedRead objects with my own annotation tables to produce 
>> per-gene counts.  I've done this with sets of 2-3 AlignedRead objects (each 
>> representing an Illumina lane), but I'd like to extend the approach to a few 
>> dozen lanes.  Since this is far too much data to fit in memory, I need an 
>> efficient way to combine many AlignedRead objects at once that doesn't rely 
>> on them being loaded as objects at the same time.
>> 
>> I imagine that I need to load the objects into tables using the 
>> importFromAlignedReads, and then join the appropriate columns, either before 
>> or after aggregation (the manual hints that afterwards is preferable).  
>> However, there are a few points I'm confused with (probably resulting from 
>> my limited experience with SQLite):
>> 
>> - I've been unable load to load a SQLite database file that was previously 
>> created with the importFromAlignedReads--what is the best way to load the 
>> database connection--for instance, during a new R session?
>> 
>> -Can AlignedRead objects only be imported (via importFromAlignedReads) as 
>> named lists of two or more objects?  What about single AlignedRead objects?  
>> I would imagine that a solution to my problem would be to create a separate 
>> table in a database file for each of my AlignedRead objects (I made a loop 
>> to do this), and then join these tables (as long as I can create a 
>> connection to the database).
>> 
>> I think my problems could be solved if I could load the AlignedRead objects 
>> from multiple lanes into tables in database file, load it, and join the 
>> appropriate columns from the various tables (and then aggregate with the 
>> annotations in a single step--this would seem to be the most 
>> straightforward).  Any advice on accomplishing these steps would be much 
>> appreciated.
>> 
>> Thanks again,
>> Joe Franklin
>> 
>> ________________________________
>> Joseph Franklin
>> Department of Cell Biology
>> Yale University
>> 295 Congress Ave, BCMM 137
>> New Haven, CT 06519
>> USA
>> 
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> 

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

Reply via email to