Re: [Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

joseph franklin Tue, 20 Apr 2010 10:25:48 -0700

Kasper,
Thanks--importing from a vector of filenames was working perfectly.  However, 
when I upgraded to Genominator 1.1.6 today, and ran the same import command (I 
think), I get the error below.


I may be doing something wrong without realizing it.  
Thanks again,
Joe

(flyFiles is a vector of filenames)

> flydata<-importFromAlignedReads(x=flyFiles, type="Bowtie" , chrMap=chrMap, 
> dbFilename="~/g/annotation/genominator/flydata.db", tablename="raw")
Error in importToExpData(data.frame(chr = chr, location = loc, strand = str),  
: 
  After removing missing locations, df has no rows.
Timing stopped at: 0.94 0.06 0.999 

> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-04-18 r51771) 
x86_64-unknown-linux-gnu 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ShortRead_1.5.23     Rsamtools_0.2.8      lattice_0.18-5      
 [4] Biostrings_2.15.27   GenomicRanges_0.1.16 Genominator_1.1.6   
 [7] GenomeGraphs_1.7.2   biomaRt_2.3.5        IRanges_1.5.79      
[10] RSQLite_0.8-4        DBI_0.2-5           

loaded via a namespace (and not attached):
[1] Biobase_2.7.6 hwriter_1.2   RCurl_1.4-1   XML_2.8-1 



On 19 Apr 2010, at 11:26, Kasper Daniel Hansen wrote:

> Hi Joe
> 
> This is addressed in the development version.  We now have the
> capability of giving importFromAlignedReads a (named) vector of
> filenames instead of a named list of AlignedRead objects.  This vector
> of filenames will be read in one at a time, so you just need enough
> memory to process a single lane.  I have processed around 160 lanes
> worth of data using this approach.
> 
> There is an extended example in the 'with ShortRead' vignette.
> 
> importFromAlignedReads also has the capability of directly summing
> several columns (fi you need this).  So let us say you have 6 files
> (lanes) and you want to end up with a database with 2 columns
> (assuming you have a 3x2 experiment and you have decided to add up
> over the lanes).  Then you can do this using a construction where the
> names of the files are like
>  "a", "a", "a", "b", "b", "b"
> (this will create two columns named "a" and "b" each holding 3 lanes
> worth of data).
> 
> In this case, all 3 lanes will be read into memory at the same time -
> it is less memory efficient but it was much easier to code.  If that
> is impossible you should create a standard 6 column database and then
> use collapseExpData.  The importFromAlignedReads is more of a
> convenience (and speed) trick.
> 
> I uploaded a new version 1.1.6 yesterday which I recommend, because of
> some documentation updates.  This version should replace 1.1.5 on the
> Bioconductor development servers sometime tomorrow.
> 
> Kasper
> 
> 
> On Mon, Apr 19, 2010 at 11:06 AM, joseph franklin
> <[email protected]> wrote:
>> I'm addressing this to Jim Bullard, who has been really helpful answering 
>> some of my questions, as well as the list, in case anyone has some advice 
>> for me.
>> 
>> I've started using Genominator (I'm using the release version right now) to 
>> quantitate and analyze RNA-seq data, and have been really successful 
>> aggregating AlignedRead objects with my own annotation tables to produce 
>> per-gene counts.  I've done this with sets of 2-3 AlignedRead objects (each 
>> representing an Illumina lane), but I'd like to extend the approach to a few 
>> dozen lanes.  Since this is far too much data to fit in memory, I need an 
>> efficient way to combine many AlignedRead objects at once that doesn't rely 
>> on them being loaded as objects at the same time.
>> 
>> I imagine that I need to load the objects into tables using the 
>> importFromAlignedReads, and then join the appropriate columns, either before 
>> or after aggregation (the manual hints that afterwards is preferable).  
>> However, there are a few points I'm confused with (probably resulting from 
>> my limited experience with SQLite):
>> 
>> - I've been unable load to load a SQLite database file that was previously 
>> created with the importFromAlignedReads--what is the best way to load the 
>> database connection--for instance, during a new R session?
>> 
>> -Can AlignedRead objects only be imported (via importFromAlignedReads) as 
>> named lists of two or more objects?  What about single AlignedRead objects?  
>> I would imagine that a solution to my problem would be to create a separate 
>> table in a database file for each of my AlignedRead objects (I made a loop 
>> to do this), and then join these tables (as long as I can create a 
>> connection to the database).
>> 
>> I think my problems could be solved if I could load the AlignedRead objects 
>> from multiple lanes into tables in database file, load it, and join the 
>> appropriate columns from the various tables (and then aggregate with the 
>> annotations in a single step--this would seem to be the most 
>> straightforward).  Any advice on accomplishing these steps would be much 
>> appreciated.
>> 
>> Thanks again,
>> Joe Franklin
>> 
>> ________________________________
>> Joseph Franklin
>> Department of Cell Biology
>> Yale University
>> 295 Congress Ave, BCMM 137
>> New Haven, CT 06519
>> USA
>> 
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> 

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

Reply via email to