[base] high-throughput transcriptome sequencing data

Bob MacCallum Tue, 21 Oct 2008 05:35:29 -0700

Hi again.

Any thoughts on this (see below) at all?  Please reply off-list if you are
feeling shy.


I'd also like to raise some general questions about scalability.

1. Is anyone working on an Affymetrix SNP plugin?

2. Is anyone doing anything with tiling arrays?

I realise that archiving the .CEL files is no problem.  Using BASE to run
analysis programs on those files is possible through plugins.  But storing
per-feature data in the analysis tables is going to break, when you have
millions of features, right?

cheers,
Bob.

Bob MacCallum writes:
 > 
 > I'm just thinking out loud about how to incorporate high throughput
 > transcriptome sequencing data into BASE.  It's some way off, but I'm assuming
 > that it will be cheap and quantitative enough to replace arrays at some point
 > during the renewal period of our project (2009-2014).
 > 
 > 1. Create an "array design" with all genes of interest (ideally this would be
 >    the largest set possible, e.g. known genes + predicted genes of all
 >    qualities, perhaps even predicted genes from the new sequence data).  The
 >    layout would be fictitious, of course (what's the minimum one can get away
 >    with?).
 > 
 > 2. Create a rawbioassay to correspond to each sequencing run.
 > 
 > Then *one* of 3a/b/c for each sequencing run/rawbioassay:
 > 
 > 3a. Outside BASE, align the new sequences to genome or transcript sequences
 >     and calculate "intensities" for each gene on the "array design" and dump
 >     into a tab delimited raw data file.  Attach that file to the rawbioassay
 >     and import numeric data as usual.
 > 
 > 3b. Upload the text file of sequences to the raw bioassay's "data file".
 >     Create a BASE plugin to do the the alignment and quantification as in 3a,
 >     and load the numeric data into the database.
 > 
 > 3c. Similar to 3b, but calculate the intensities at the "create root 
 > bioassay"
 >     step, similar to the Affymetrix RMA plugin.
 > 
 > 4. continue with analysis as normal.  biosources, samples etc can be linked 
 > to
 >    the bioassay too, of course.
 > 
 > I guess a new raw data type (for "Generic" platform) would have to be
 > created for 3a (and 3b?) but that's not difficult.
 > 
 > Is it possible to go with 3a, but also attach the sequence file to the raw
 > bioassay (or scan?) - something like keeping tiff files for scans?  Just for
 > documentation purposes.
 > 
 > Any thoughts from the community or developers?
 > 
 > cheers,
 > Bob.
 > 
 > -- 
 > Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
 > Division of Cell and Molecular Biology | Imperial College London |
 > Phone +442075941945 | Email [EMAIL PROTECTED]
 > 
 > -------------------------------------------------------------------------
 > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 > Build the coolest Linux based applications with Moblin SDK & win great prizes
 > Grand prize is a trip for two to an Open Source event anywhere in the world
 > http://moblin-contest.org/redirect.php?banner_id=100&url=/
 > _______________________________________________
 > The BASE general discussion mailing list
 > [email protected]
 > unsubscribe: send a mail with subject "unsubscribe" to
 > [EMAIL PROTECTED]

-- 
Bob MacCallum | VectorBase Developer | Kafatos/Christophides Groups |
Division of Cell and Molecular Biology | Imperial College London |
Phone +442075941945 | Email [EMAIL PROTECTED]

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
The BASE general discussion mailing list
[email protected]
unsubscribe: send a mail with subject "unsubscribe" to
[EMAIL PROTECTED]

[base] high-throughput transcriptome sequencing data

Reply via email to