Re: [Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

Kasper Daniel Hansen Tue, 20 Apr 2010 13:56:37 -0700

Turns out I had the exact same R-devel on my system.  And it seems I
have the exact same version of the various packages you have and the
same search order.  On this version, the package vignette works.


So I guess this leaves one of the following possibilities (1) your
flyFiles is somehow wrong or (2) withShortRead does not work on your
system or (3) bug in Genominator 1.1.6.

If I understand you correctly, the exact same code worked yesterday
with 1.1.5?  I am baffled.  I have a hard time seeing how a bug could
get introduced that would not also lead to withShortRead failing.

So could you send me (off-list)
(1) a printing of flyFiles
(2) Check that withShortRead works
(3) run importFromAlignedReads with verbose = TRUE
(4) debug(importToExpData) and step though it.

Kasper


R version 2.12.0 Under development (unstable) (2010-04-18 r51771)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
 [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
 [5] LC_MONETARY=C                  LC_MESSAGES=en_US.iso885915
 [7] LC_PAPER=en_US.iso885915       LC_NAME=C
 [9] LC_ADDRESS=C                   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets
methods
[8] base

other attached packages:
 [1] yeastRNASeq_0.0.3    ShortRead_1.5.23     Rsamtools_0.2.8
 [4] lattice_0.18-5       Biostrings_2.15.27   GenomicRanges_0.1.16
 [7] Genominator_1.1.6    GenomeGraphs_1.7.2   biomaRt_2.3.5
[10] IRanges_1.5.79       RSQLite_0.8-4        DBI_0.2-5

loaded via a namespace (and not attached):
[1] Biobase_2.7.6 hwriter_1.2   RCurl_1.4-1   tools_2.12.0  XML_2.8-1

On Tue, Apr 20, 2010 at 1:23 PM, joseph franklin
<[email protected]> wrote:
> Kasper,
> Thanks--importing from a vector of filenames was working perfectly.  However, 
> when I upgraded to Genominator 1.1.6 today, and ran the same import command 
> (I think), I get the error below.
>
> I may be doing something wrong without realizing it.
> Thanks again,
> Joe
>
> (flyFiles is a vector of filenames)
>
>> flydata<-importFromAlignedReads(x=flyFiles, type="Bowtie" , chrMap=chrMap, 
>> dbFilename="~/g/annotation/genominator/flydata.db", tablename="raw")
> Error in importToExpData(data.frame(chr = chr, location = loc, strand = str), 
>  :
>  After removing missing locations, df has no rows.
> Timing stopped at: 0.94 0.06 0.999
>
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-04-18 r51771)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] ShortRead_1.5.23     Rsamtools_0.2.8      lattice_0.18-5
>  [4] Biostrings_2.15.27   GenomicRanges_0.1.16 Genominator_1.1.6
>  [7] GenomeGraphs_1.7.2   biomaRt_2.3.5        IRanges_1.5.79
> [10] RSQLite_0.8-4        DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.7.6 hwriter_1.2   RCurl_1.4-1   XML_2.8-1
>
>
>
> On 19 Apr 2010, at 11:26, Kasper Daniel Hansen wrote:
>
>> Hi Joe
>>
>> This is addressed in the development version.  We now have the
>> capability of giving importFromAlignedReads a (named) vector of
>> filenames instead of a named list of AlignedRead objects.  This vector
>> of filenames will be read in one at a time, so you just need enough
>> memory to process a single lane.  I have processed around 160 lanes
>> worth of data using this approach.
>>
>> There is an extended example in the 'with ShortRead' vignette.
>>
>> importFromAlignedReads also has the capability of directly summing
>> several columns (fi you need this).  So let us say you have 6 files
>> (lanes) and you want to end up with a database with 2 columns
>> (assuming you have a 3x2 experiment and you have decided to add up
>> over the lanes).  Then you can do this using a construction where the
>> names of the files are like
>>  "a", "a", "a", "b", "b", "b"
>> (this will create two columns named "a" and "b" each holding 3 lanes
>> worth of data).
>>
>> In this case, all 3 lanes will be read into memory at the same time -
>> it is less memory efficient but it was much easier to code.  If that
>> is impossible you should create a standard 6 column database and then
>> use collapseExpData.  The importFromAlignedReads is more of a
>> convenience (and speed) trick.
>>
>> I uploaded a new version 1.1.6 yesterday which I recommend, because of
>> some documentation updates.  This version should replace 1.1.5 on the
>> Bioconductor development servers sometime tomorrow.
>>
>> Kasper
>>
>>
>> On Mon, Apr 19, 2010 at 11:06 AM, joseph franklin
>> <[email protected]> wrote:
>>> I'm addressing this to Jim Bullard, who has been really helpful answering 
>>> some of my questions, as well as the list, in case anyone has some advice 
>>> for me.
>>>
>>> I've started using Genominator (I'm using the release version right now) to 
>>> quantitate and analyze RNA-seq data, and have been really successful 
>>> aggregating AlignedRead objects with my own annotation tables to produce 
>>> per-gene counts.  I've done this with sets of 2-3 AlignedRead objects (each 
>>> representing an Illumina lane), but I'd like to extend the approach to a 
>>> few dozen lanes.  Since this is far too much data to fit in memory, I need 
>>> an efficient way to combine many AlignedRead objects at once that doesn't 
>>> rely on them being loaded as objects at the same time.
>>>
>>> I imagine that I need to load the objects into tables using the 
>>> importFromAlignedReads, and then join the appropriate columns, either 
>>> before or after aggregation (the manual hints that afterwards is 
>>> preferable).  However, there are a few points I'm confused with (probably 
>>> resulting from my limited experience with SQLite):
>>>
>>> - I've been unable load to load a SQLite database file that was previously 
>>> created with the importFromAlignedReads--what is the best way to load the 
>>> database connection--for instance, during a new R session?
>>>
>>> -Can AlignedRead objects only be imported (via importFromAlignedReads) as 
>>> named lists of two or more objects?  What about single AlignedRead objects? 
>>>  I would imagine that a solution to my problem would be to create a 
>>> separate table in a database file for each of my AlignedRead objects (I 
>>> made a loop to do this), and then join these tables (as long as I can 
>>> create a connection to the database).
>>>
>>> I think my problems could be solved if I could load the AlignedRead objects 
>>> from multiple lanes into tables in database file, load it, and join the 
>>> appropriate columns from the various tables (and then aggregate with the 
>>> annotations in a single step--this would seem to be the most 
>>> straightforward).  Any advice on accomplishing these steps would be much 
>>> appreciated.
>>>
>>> Thanks again,
>>> Joe Franklin
>>>
>>> ________________________________
>>> Joseph Franklin
>>> Department of Cell Biology
>>> Yale University
>>> 295 Congress Ave, BCMM 137
>>> New Haven, CT 06519
>>> USA
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>
>

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Genominator: strategy for combining multiple AlignedRead objects

Reply via email to