Re: [base] BASE 2 & Illumina arrays

Nicklas Nordborg Wed, 01 Aug 2007 00:50:31 -0700

Jeremy Davis-Turak wrote:
> Hi Nicklas,
> 
> I uploaded some data files to the Ticket.


Thanks a lot! That was exactly what we needed.

> 
> Here is a brief summary of what the data looks like:
> 
> 1)  Annotation data: CSV file.  It's too bad that it's a CSV, because
> some of the fields contain commas!

Hmmm... it looks hard to import this with existing importers. I'll have 
to start with the raw data import and leave this until later.

> 2)  Data: (header is on ~ line 8)
> a) For each set of chips that are processed at the same time, there is
> one resulting file.  Thus, if you did two rat chips (each of which has
> 12 arrays on them), you would have 24 arrays contained in one file.
> b) Depending on the settings of the software at the time of scanning,
> you can have somewhere from 1-8 data columns per array (I don't know
> the exact range, but I know that it's variable).
> c)  The first column contains the probe IDs, the rest of them are data.
> d) Each data column name is a concatenation of 3 things:
>    i)  The data type (i.e. 'AVG_Signal' or 'BEAD_STDEV')
>   ii) The chip number (10 digits)
>    iii) A capital letter indicating the position of the array on the
> chip (i.e. A-F for human, A-H for mouse, or A-L  for rat.)
>    EXAMPLE: the first 8 columns in my rat file are:

I should be rather easy to create the raw bioassays. Once we have found 
the column headers, we can extract the chip number and the capital 
letter and use as name for the raw bioassays. The remaining parts of the 
  headers should be easy to map to raw data properties (since you have 
already done this in the raw-data-types.xml for us).

Do we have to worry about messed up files? For example, if there is 
AVG_Signal and BEAD_STDEV columns for one data set but only AVG_Signal 
for another?

We could simply stop there and let the users revert to manual work if 
the needed to connect the imported raw bioassays with scans, array 
designs and experiments, but I think we can do a little bit more. I just 
have a few questions.

Should all raw bioassays be associated with a single scan (and thus the 
same hybridization) or do we need to associate the raw bioassys from 
each chip with separate scan and hybridization?

It is difficult to associate the raw bioassays with array designs, since 
there are no spot coordinates in the file. We could fake this and use 
block=1, column=1 and row=row number in file. The benefit is that 
analysis will behave better if all raw bioassays are associated with the 
same array design. The drawback is that we must also fake the array 
design in the same way. It should be possible to use the existing 
ReporterMapImporter for this if we feed it the same raw data file.

I am also thinking of the possibility of using the plug-in from the 
Experiment view page if the experiment is of the 'illumina' data type.
Then, the raw bioassays created by the import could be assigned to the 
experiment by the plug-in, saving yet another manual step.

> 
> Thanks for making this plugin!

Well... it is not implemented yet...

>> The files will be put in a protected repository that is only
>> available to the core developers. 

Since you uploaded the files to out Trac I assume that you are not 
worried about other users seeing them. Is it ok to use some of the files 
in our regular test programs? They will not be included in the binary 
distribution, only in the source distribution and of course from direct 
subversion access.

/Nicklas


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
[EMAIL PROTECTED]

Re: [base] BASE 2 & Illumina arrays

Reply via email to