Re: [base] BASE 2 Illumina arrays

2007-08-01 Thread Nicklas Nordborg
Jeremy Davis-Turak wrote:
 Hi Nicklas,
 
 I uploaded some data files to the Ticket.

Thanks a lot! That was exactly what we needed.

 
 Here is a brief summary of what the data looks like:
 
 1)  Annotation data: CSV file.  It's too bad that it's a CSV, because
 some of the fields contain commas!

Hmmm... it looks hard to import this with existing importers. I'll have 
to start with the raw data import and leave this until later.

 2)  Data: (header is on ~ line 8)
 a) For each set of chips that are processed at the same time, there is
 one resulting file.  Thus, if you did two rat chips (each of which has
 12 arrays on them), you would have 24 arrays contained in one file.
 b) Depending on the settings of the software at the time of scanning,
 you can have somewhere from 1-8 data columns per array (I don't know
 the exact range, but I know that it's variable).
 c)  The first column contains the probe IDs, the rest of them are data.
 d) Each data column name is a concatenation of 3 things:
i)  The data type (i.e. 'AVG_Signal' or 'BEAD_STDEV')
   ii) The chip number (10 digits)
iii) A capital letter indicating the position of the array on the
 chip (i.e. A-F for human, A-H for mouse, or A-L  for rat.)
EXAMPLE: the first 8 columns in my rat file are:

I should be rather easy to create the raw bioassays. Once we have found 
the column headers, we can extract the chip number and the capital 
letter and use as name for the raw bioassays. The remaining parts of the 
  headers should be easy to map to raw data properties (since you have 
already done this in the raw-data-types.xml for us).

Do we have to worry about messed up files? For example, if there is 
AVG_Signal and BEAD_STDEV columns for one data set but only AVG_Signal 
for another?

We could simply stop there and let the users revert to manual work if 
the needed to connect the imported raw bioassays with scans, array 
designs and experiments, but I think we can do a little bit more. I just 
have a few questions.

Should all raw bioassays be associated with a single scan (and thus the 
same hybridization) or do we need to associate the raw bioassys from 
each chip with separate scan and hybridization?

It is difficult to associate the raw bioassays with array designs, since 
there are no spot coordinates in the file. We could fake this and use 
block=1, column=1 and row=row number in file. The benefit is that 
analysis will behave better if all raw bioassays are associated with the 
same array design. The drawback is that we must also fake the array 
design in the same way. It should be possible to use the existing 
ReporterMapImporter for this if we feed it the same raw data file.

I am also thinking of the possibility of using the plug-in from the 
Experiment view page if the experiment is of the 'illumina' data type.
Then, the raw bioassays created by the import could be assigned to the 
experiment by the plug-in, saving yet another manual step.

 
 Thanks for making this plugin!

Well... it is not implemented yet...

 The files will be put in a protected repository that is only
 available to the core developers. 

Since you uploaded the files to out Trac I assume that you are not 
worried about other users seeing them. Is it ok to use some of the files 
in our regular test programs? They will not be included in the binary 
distribution, only in the source distribution and of course from direct 
subversion access.

/Nicklas


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


Re: [base] BASE 2 Illumina arrays

2007-08-01 Thread Jeremy Davis-Turak
On 8/1/07, Nicklas Nordborg [EMAIL PROTECTED] wrote:
 Jeremy Davis-Turak wrote:
  Hi Nicklas,
 
  I uploaded some data files to the Ticket.

 Thanks a lot! That was exactly what we needed.

 
  Here is a brief summary of what the data looks like:
 
  1)  Annotation data: CSV file.  It's too bad that it's a CSV, because
  some of the fields contain commas!

 Hmmm... it looks hard to import this with existing importers. I'll have
 to start with the raw data import and leave this until later.


Yeah, what I did was just open it in excel and save it as a .txt file.
 Not ideal, but the easiest way so far.


  2)  Data: (header is on ~ line 8)
  a) For each set of chips that are processed at the same time, there is
  one resulting file.  Thus, if you did two rat chips (each of which has
  12 arrays on them), you would have 24 arrays contained in one file.
  b) Depending on the settings of the software at the time of scanning,
  you can have somewhere from 1-8 data columns per array (I don't know
  the exact range, but I know that it's variable).
  c)  The first column contains the probe IDs, the rest of them are data.
  d) Each data column name is a concatenation of 3 things:
 i)  The data type (i.e. 'AVG_Signal' or 'BEAD_STDEV')
ii) The chip number (10 digits)
 iii) A capital letter indicating the position of the array on the
  chip (i.e. A-F for human, A-H for mouse, or A-L  for rat.)
 EXAMPLE: the first 8 columns in my rat file are:

 I should be rather easy to create the raw bioassays. Once we have found
 the column headers, we can extract the chip number and the capital
 letter and use as name for the raw bioassays. The remaining parts of the
   headers should be easy to map to raw data properties (since you have
 already done this in the raw-data-types.xml for us).

 Do we have to worry about messed up files? For example, if there is
 AVG_Signal and BEAD_STDEV columns for one data set but only AVG_Signal
 for another?

I haven't encountered any messed up files yet.  However, I think it
would be easy to catch them, since you would be parsing the headers
anyway.  I don't know if other people have files like that, which they
wish to import, but for me, I would want the plugin to throw an error
in that case.


 We could simply stop there and let the users revert to manual work if
 the needed to connect the imported raw bioassays with scans, array
 designs and experiments, but I think we can do a little bit more. I just
 have a few questions.

 Should all raw bioassays be associated with a single scan (and thus the
 same hybridization) or do we need to associate the raw bioassys from
 each chip with separate scan and hybridization?


I'll get back to you later to confirm this, but I believe I made one
hybridization and one scan.  For us, this made most sense because it
models what actually goes on.  I don't know what other groups prefer,
or if they require any functionality that is lost by having only one
hyb.

 It is difficult to associate the raw bioassays with array designs, since
 there are no spot coordinates in the file. We could fake this and use
 block=1, column=1 and row=row number in file. The benefit is that
 analysis will behave better if all raw bioassays are associated with the
 same array design. The drawback is that we must also fake the array
 design in the same way. It should be possible to use the existing
 ReporterMapImporter for this if we feed it the same raw data file.


We don't actually use array designs at this point, so I'm not sure how
to address this.  Faking it sounds fine to me.

However, as a side note, the reason there is no spot info is that for
Illumina, each array on  each chip is different!  The scanning
software reads in a set of files which contain the array designs, and
spits out the gene_profile.csv file, which is actually the data
AVERAGED over all the beads for each probe.  So, if someone REALLY
wanted to get into the deeper level of analysis (bead-level), they
would have to upload some additional files (which I've never dealt
with).  Thus, I recommend not dealing with that layer just yet.

 I am also thinking of the possibility of using the plug-in from the
 Experiment view page if the experiment is of the 'illumina' data type.
 Then, the raw bioassays created by the import could be assigned to the
 experiment by the plug-in, saving yet another manual step.


That seems cool.  Would it be then easy to extend this feature to all
data types?

 
  Thanks for making this plugin!

 Well... it is not implemented yet...

  The files will be put in a protected repository that is only
  available to the core developers.

 Since you uploaded the files to out Trac I assume that you are not
 worried about other users seeing them. Is it ok to use some of the files
 in our regular test programs? They will not be included in the binary
 distribution, only in the source distribution and of course from direct
 subversion access.

Yes, you can use those data in your 

Re: [base] BASE 2 Illumina arrays

2007-07-31 Thread Nicklas Nordborg
Nicklas Nordborg wrote:
 Jeremy Davis-Turak wrote:
 I have implemented the incorporation of Illumina data into our BASE 
 system, along with a raw data importer.  Our version requires that the 
 gene_profile.csv be split up into separate files, which we have done 
 with an R script.  This can also be done manually in excel, or feasibly 
 in a BASE script. 

 Nicklas, how do I share the plugin configurations with others?
 
 I have created a ticket in our trac system for this issue:
 http://base.thep.lu.se/ticket/486
 
 Everyone are welcome to comment and it is also possible to attach files 
 to it. Note that you have to be logged in before you can comment or 
 upload files. Use base/base as username/password.
 
 I think the best solution would be to have a BASE plugin that can split 
 the file automatically and the import the raw data in one go. If that is 
 possible or not I don't know since I have no knowledge of the file format.

We have planned to include the Illumina import plug-in in the next 
release (2.4). We have not been able to find any good description of the 
file format or example data files. Does anybody out there have any 
information to help us implement this?

We definitely need one or more example data files that we can use for 
testing. The files will be put in a protected repository that is only 
available to the core developers. It would also be nice to have a more 
formal description, or at least a short overview, of the file format.

Since it is only 3 weeks to go before 2.4 is released we need any 
information as soon as possible (this week!).

/Nicklas

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


Re: [base] BASE 2 Illumina arrays

2007-07-31 Thread Jeremy Davis-Turak
Hi Nicklas,

I uploaded some data files to the Ticket.

Here is a brief summary of what the data looks like:

1)  Annotation data: CSV file.  It's too bad that it's a CSV, because
some of the fields contain commas!

2)  Data: (header is on ~ line 8)
a) For each set of chips that are processed at the same time, there is
one resulting file.  Thus, if you did two rat chips (each of which has
12 arrays on them), you would have 24 arrays contained in one file.
b) Depending on the settings of the software at the time of scanning,
you can have somewhere from 1-8 data columns per array (I don't know
the exact range, but I know that it's variable).
c)  The first column contains the probe IDs, the rest of them are data.
d) Each data column name is a concatenation of 3 things:
   i)  The data type (i.e. 'AVG_Signal' or 'BEAD_STDEV')
  ii) The chip number (10 digits)
   iii) A capital letter indicating the position of the array on the
chip (i.e. A-F for human, A-H for mouse, or A-L  for rat.)
   EXAMPLE: the first 8 columns in my rat file are:

AVG_Signal-1677718123_A
BEAD_STDEV-1677718123_A
Avg_NBEADS-1677718123_A 
Detection-1677718123_A  
AVG_Signal-1677718123_B 
BEAD_STDEV-1677718123_B 
Avg_NBEADS-1677718123_B 
Detection-1677718123_B

 and a number of columns later, they transition smoothly to the next chip:

Avg_NBEADS-1677718123_L 
Detection-1677718123_L  
AVG_Signal-1677718142_A 
BEAD_STDEV-1677718142_A 


In my R script, you have to hard-code the number of data columns per
array, and the number of arrays per chip.

Thanks for making this plugin!

Jeremy



On 7/31/07, Nicklas Nordborg [EMAIL PROTECTED] wrote:
 Nicklas Nordborg wrote:
  Jeremy Davis-Turak wrote:
  I have implemented the incorporation of Illumina data into our BASE
  system, along with a raw data importer.  Our version requires that the
  gene_profile.csv be split up into separate files, which we have done
  with an R script.  This can also be done manually in excel, or feasibly
  in a BASE script.
 
  Nicklas, how do I share the plugin configurations with others?
 
  I have created a ticket in our trac system for this issue:
  http://base.thep.lu.se/ticket/486
 
  Everyone are welcome to comment and it is also possible to attach files
  to it. Note that you have to be logged in before you can comment or
  upload files. Use base/base as username/password.
 
  I think the best solution would be to have a BASE plugin that can split
  the file automatically and the import the raw data in one go. If that is
  possible or not I don't know since I have no knowledge of the file format.

 We have planned to include the Illumina import plug-in in the next
 release (2.4). We have not been able to find any good description of the
 file format or example data files. Does anybody out there have any
 information to help us implement this?

 We definitely need one or more example data files that we can use for
 testing. The files will be put in a protected repository that is only
 available to the core developers. It would also be nice to have a more
 formal description, or at least a short overview, of the file format.

 Since it is only 3 weeks to go before 2.4 is released we need any
 information as soon as possible (this week!).

 /Nicklas

 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now   http://get.splunk.com/
 ___
 The BASE general discussion mailing list
 basedb-users@lists.sourceforge.net
 unsubscribe: send a mail with subject unsubscribe to
 [EMAIL PROTECTED]


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


[base] BASE 2 Illumina arrays

2007-02-05 Thread Emil Lundberg

Hi all,

Our site will likely wish to store Illumina array data in BASE 2 in  
the near future. Sifting through the mailing list archive, this seems  
indeed possible (based on the post below, from july '06).


My question is has anyone implemented this already (Reha Y.?) and is  
willing to share the specifications and/or will this find its way  
into BASE 2 proper some day?



regards,

/Emil


Emil Lundberg / Sysadm
Linnaeus Centre for Bioinformatics
Uppsala University



Reha Yildirimman wrote:
 Hello,

 using Base2 we are trying to implement the Illumina Chip design.  
In order
 to use the raw data files describing coordinates and intensities  
of probes

 there is a need to define a new raw data type besides the given ones
 (affymetrix,agilent,spotfinder,...).

 Is this already implemented, is there a way around the data  
type problem

 or am I looking at the wrong spot in base2 ?

Yes, you can add raw data types to Base2. You have to find and modify
the file 'raw-data-types.xml'. It should be located in the
basedir/www/WEB-INF/classes directory.

Add a tag raw-data-type for the Illumina Chip for example:

raw-data-type
id=illumina
channels=2
name=Illumina
table=RawDataIllumina


Then add as many property tags as you need. The last step is to run
the basedir/bin/updatedb.sh script to create the new table.

/Nicklas





-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


Re: [base] BASE 2 Illumina arrays

2007-02-05 Thread Sean Davis
On Monday 05 February 2007 13:30, Nicklas Nordborg wrote:
 Emil Lundberg wrote:
  Hi all,
 
  Our site will likely wish to store Illumina array data in BASE 2 in the
  near future. Sifting through the mailing list archive, this seems indeed
  possible (based on the post below, from july '06).
 
  My question is has anyone implemented this already (Reha Y.?) and is
  willing to share the specifications and/or will this find its way into
  BASE 2 proper some day?

 If somebody is willing to implement it. Right now we have to focus on
 other things. We can put it into the standard BASE distribution if someone:

 * comes up with a raw-data-type definition for the Illumina Chip
 * provides one or more plugin configurations for importing the data files

 For an example of the existing plugin configurations see:
 http://base.thep.lu.se/chrome/site/doc/admin/plugin_configuration/coreplugi
ns.html

One of the issues with Illumina data is that several hybridizations happen on 
one array, all in the same channel.  Furthermore, the output files from 
image analysis are no longer one-hyb/one-file.  There are multiple hybs 
represented in one file.  This will make constructing a plugin difficult, I 
would think, but I could be wrong.

Sean

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


Re: [base] BASE 2 Illumina arrays

2007-02-05 Thread Jeremy Davis-Turak

I have implemented the incorporation of Illumina data into our BASE system,
along with a raw data importer.  Our version requires that the
gene_profile.csv be split up into separate files, which we have done with an
R script.  This can also be done manually in excel, or feasibly in a BASE
script.

Nicklas, how do I share the plugin configurations with others?

Jeremy

UCLA Department of Neurology

On 2/5/07, Nicklas Nordborg [EMAIL PROTECTED] wrote:


Emil Lundberg wrote:
 Hi all,

 Our site will likely wish to store Illumina array data in BASE 2 in the
 near future. Sifting through the mailing list archive, this seems indeed
 possible (based on the post below, from july '06).

 My question is has anyone implemented this already (Reha Y.?) and is
 willing to share the specifications and/or will this find its way into
 BASE 2 proper some day?

If somebody is willing to implement it. Right now we have to focus on
other things. We can put it into the standard BASE distribution if
someone:

* comes up with a raw-data-type definition for the Illumina Chip
* provides one or more plugin configurations for importing the data files

For an example of the existing plugin configurations see:

http://base.thep.lu.se/chrome/site/doc/admin/plugin_configuration/coreplugins.html

/Nicklas



 regards,

 /Emil

 
 Emil Lundberg / Sysadm
 Linnaeus Centre for Bioinformatics
 Uppsala University


 Reha Yildirimman wrote:
  Hello,
 
  using Base2 we are trying to implement the Illumina Chip design.
 In order
  to use the raw data files describing coordinates and intensities
 of probes
  there is a need to define a new raw data type besides the given ones
  (affymetrix,agilent,spotfinder,...).
 
  Is this already implemented, is there a way around the data
 type problem
  or am I looking at the wrong spot in base2 ?

 Yes, you can add raw data types to Base2. You have to find and modify
 the file 'raw-data-types.xml'. It should be located in the
 basedir/www/WEB-INF/classes directory.

 Add a tag raw-data-type for the Illumina Chip for example:

 raw-data-type
 id=illumina
 channels=2
 name=Illumina
 table=RawDataIllumina
 

 Then add as many property tags as you need. The last step is to run
 the basedir/bin/updatedb.sh script to create the new table.

 /Nicklas





 


-
 Using Tomcat but need to do more? Need to support web services,
security?
 Get stuff done quickly with pre-integrated technology to make your job
easier.
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642


 

 ___
 The BASE general discussion mailing list
 basedb-users@lists.sourceforge.net
 unsubscribe: send a mail with subject unsubscribe to
 [EMAIL PROTECTED]


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]