Re: [base] BASE 2 Illumina arrays
Jeremy Davis-Turak wrote: Hi Nicklas, I uploaded some data files to the Ticket. Thanks a lot! That was exactly what we needed. Here is a brief summary of what the data looks like: 1) Annotation data: CSV file. It's too bad that it's a CSV, because some of the fields contain commas! Hmmm... it looks hard to import this with existing importers. I'll have to start with the raw data import and leave this until later. 2) Data: (header is on ~ line 8) a) For each set of chips that are processed at the same time, there is one resulting file. Thus, if you did two rat chips (each of which has 12 arrays on them), you would have 24 arrays contained in one file. b) Depending on the settings of the software at the time of scanning, you can have somewhere from 1-8 data columns per array (I don't know the exact range, but I know that it's variable). c) The first column contains the probe IDs, the rest of them are data. d) Each data column name is a concatenation of 3 things: i) The data type (i.e. 'AVG_Signal' or 'BEAD_STDEV') ii) The chip number (10 digits) iii) A capital letter indicating the position of the array on the chip (i.e. A-F for human, A-H for mouse, or A-L for rat.) EXAMPLE: the first 8 columns in my rat file are: I should be rather easy to create the raw bioassays. Once we have found the column headers, we can extract the chip number and the capital letter and use as name for the raw bioassays. The remaining parts of the headers should be easy to map to raw data properties (since you have already done this in the raw-data-types.xml for us). Do we have to worry about messed up files? For example, if there is AVG_Signal and BEAD_STDEV columns for one data set but only AVG_Signal for another? We could simply stop there and let the users revert to manual work if the needed to connect the imported raw bioassays with scans, array designs and experiments, but I think we can do a little bit more. I just have a few questions. Should all raw bioassays be associated with a single scan (and thus the same hybridization) or do we need to associate the raw bioassys from each chip with separate scan and hybridization? It is difficult to associate the raw bioassays with array designs, since there are no spot coordinates in the file. We could fake this and use block=1, column=1 and row=row number in file. The benefit is that analysis will behave better if all raw bioassays are associated with the same array design. The drawback is that we must also fake the array design in the same way. It should be possible to use the existing ReporterMapImporter for this if we feed it the same raw data file. I am also thinking of the possibility of using the plug-in from the Experiment view page if the experiment is of the 'illumina' data type. Then, the raw bioassays created by the import could be assigned to the experiment by the plug-in, saving yet another manual step. Thanks for making this plugin! Well... it is not implemented yet... The files will be put in a protected repository that is only available to the core developers. Since you uploaded the files to out Trac I assume that you are not worried about other users seeing them. Is it ok to use some of the files in our regular test programs? They will not be included in the binary distribution, only in the source distribution and of course from direct subversion access. /Nicklas - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject unsubscribe to [EMAIL PROTECTED]
Re: [base] BASE 2 Illumina arrays
On 8/1/07, Nicklas Nordborg [EMAIL PROTECTED] wrote: Jeremy Davis-Turak wrote: Hi Nicklas, I uploaded some data files to the Ticket. Thanks a lot! That was exactly what we needed. Here is a brief summary of what the data looks like: 1) Annotation data: CSV file. It's too bad that it's a CSV, because some of the fields contain commas! Hmmm... it looks hard to import this with existing importers. I'll have to start with the raw data import and leave this until later. Yeah, what I did was just open it in excel and save it as a .txt file. Not ideal, but the easiest way so far. 2) Data: (header is on ~ line 8) a) For each set of chips that are processed at the same time, there is one resulting file. Thus, if you did two rat chips (each of which has 12 arrays on them), you would have 24 arrays contained in one file. b) Depending on the settings of the software at the time of scanning, you can have somewhere from 1-8 data columns per array (I don't know the exact range, but I know that it's variable). c) The first column contains the probe IDs, the rest of them are data. d) Each data column name is a concatenation of 3 things: i) The data type (i.e. 'AVG_Signal' or 'BEAD_STDEV') ii) The chip number (10 digits) iii) A capital letter indicating the position of the array on the chip (i.e. A-F for human, A-H for mouse, or A-L for rat.) EXAMPLE: the first 8 columns in my rat file are: I should be rather easy to create the raw bioassays. Once we have found the column headers, we can extract the chip number and the capital letter and use as name for the raw bioassays. The remaining parts of the headers should be easy to map to raw data properties (since you have already done this in the raw-data-types.xml for us). Do we have to worry about messed up files? For example, if there is AVG_Signal and BEAD_STDEV columns for one data set but only AVG_Signal for another? I haven't encountered any messed up files yet. However, I think it would be easy to catch them, since you would be parsing the headers anyway. I don't know if other people have files like that, which they wish to import, but for me, I would want the plugin to throw an error in that case. We could simply stop there and let the users revert to manual work if the needed to connect the imported raw bioassays with scans, array designs and experiments, but I think we can do a little bit more. I just have a few questions. Should all raw bioassays be associated with a single scan (and thus the same hybridization) or do we need to associate the raw bioassys from each chip with separate scan and hybridization? I'll get back to you later to confirm this, but I believe I made one hybridization and one scan. For us, this made most sense because it models what actually goes on. I don't know what other groups prefer, or if they require any functionality that is lost by having only one hyb. It is difficult to associate the raw bioassays with array designs, since there are no spot coordinates in the file. We could fake this and use block=1, column=1 and row=row number in file. The benefit is that analysis will behave better if all raw bioassays are associated with the same array design. The drawback is that we must also fake the array design in the same way. It should be possible to use the existing ReporterMapImporter for this if we feed it the same raw data file. We don't actually use array designs at this point, so I'm not sure how to address this. Faking it sounds fine to me. However, as a side note, the reason there is no spot info is that for Illumina, each array on each chip is different! The scanning software reads in a set of files which contain the array designs, and spits out the gene_profile.csv file, which is actually the data AVERAGED over all the beads for each probe. So, if someone REALLY wanted to get into the deeper level of analysis (bead-level), they would have to upload some additional files (which I've never dealt with). Thus, I recommend not dealing with that layer just yet. I am also thinking of the possibility of using the plug-in from the Experiment view page if the experiment is of the 'illumina' data type. Then, the raw bioassays created by the import could be assigned to the experiment by the plug-in, saving yet another manual step. That seems cool. Would it be then easy to extend this feature to all data types? Thanks for making this plugin! Well... it is not implemented yet... The files will be put in a protected repository that is only available to the core developers. Since you uploaded the files to out Trac I assume that you are not worried about other users seeing them. Is it ok to use some of the files in our regular test programs? They will not be included in the binary distribution, only in the source distribution and of course from direct subversion access. Yes, you can use those data in your
[base] Unable to view BioMaterials items after upgrade to 2.3.2
Hello All, I've attempted an upgrade of my base-2.2.2 installation with base-2.3.2 multiple times, but each time I am unable to view items listed in the BioMaterials table using the Samples, Extracts, or Label Extracts web pages. Almost all the icons for managing items on these pages are missing as well, such the ones that say New, Delete, Restore, Share, etc., as well as the preset filter, number of records to see, etc. The Samples page has one little stub of an icon for New. If I click this I can bring up the New Sample dialog box and create a sample that is written to the MySQL BioMaterials table, but the new item fails to show up on the Samples page. My installation is pretty standard (Red Hat 3.4.6-8, apache-tomcat-5.5.20, MySQL-server-standard-5.0.27-0, jdk 1.6.0). I have been following the documentation instructions to the letter, using updatedb.sh and updateindexes.sh, and I edited base.config to match my system. Prior to the upgrade, I had modified the extended-properties.xml and raw-data-types.xml files, but they worked fine with 2.2.2. Any ideas on what could be going wrong? Thanks, Jim Collett James R. Collett, Ph.D. Systems Biology Fellow Battelle Memorial Institute Pacific Northwest National Laboratory - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject unsubscribe to [EMAIL PROTECTED]