Jeremy Davis-Turak wrote: >>> 1) Annotation data: CSV file. It's too bad that it's a CSV, because >>> some of the fields contain commas! >> Hmmm... it looks hard to import this with existing importers. I'll have >> to start with the raw data import and leave this until later. >> > > Yeah, what I did was just open it in excel and save it as a .txt file. > Not ideal, but the easiest way so far.
I searched the net and found a regexp that can split this kind of data properly: ,(?=(?:[^"]*"[^"]*")*(?![^"]*")) I have not tested it with the annotation file yet, but if it works I'll add it as a preset in the "Test with file" function. (Thanks to Raimond Brookman http://blogs.infosupport.com/raimondb/archive/2005/04/27/199.aspx) >> Do we have to worry about messed up files? For example, if there is >> AVG_Signal and BEAD_STDEV columns for one data set but only AVG_Signal >> for another? > > I haven't encountered any messed up files yet. However, I think it > would be easy to catch them, since you would be parsing the headers > anyway. I don't know if other people have files like that, which they > wish to import, but for me, I would want the plugin to throw an error > in that case. I'll include a check for this. >> We could simply stop there and let the users revert to manual work if >> the needed to connect the imported raw bioassays with scans, array >> designs and experiments, but I think we can do a little bit more. I just >> have a few questions. >> >> Should all raw bioassays be associated with a single scan (and thus the >> same hybridization) or do we need to associate the raw bioassys from >> each chip with separate scan and hybridization? >> > > I'll get back to you later to confirm this, but I believe I made one > hybridization and one scan. For us, this made most sense because it > models what actually goes on. I don't know what other groups prefer, > or if they require any functionality that is lost by having only one > hyb. I'll go for the simple solution in the first version, which is to let the user select one scan, one protocol and one software that is associated with all raw bioassays. >> It is difficult to associate the raw bioassays with array designs, since >> there are no spot coordinates in the file. We could fake this and use >> block=1, column=1 and row=row number in file. The benefit is that >> analysis will behave better if all raw bioassays are associated with the >> same array design. The drawback is that we must also fake the array >> design in the same way. It should be possible to use the existing >> ReporterMapImporter for this if we feed it the same raw data file. >> > > We don't actually use array designs at this point, so I'm not sure how > to address this. Faking it sounds fine to me. I'll skip this part for now. If there is time over when the rest of the functionality is implemented I might give it another shot. > However, as a side note, the reason there is no spot info is that for > Illumina, each array on each chip is different! The scanning > software reads in a set of files which contain the array designs, and > spits out the "gene_profile.csv" file, which is actually the data > AVERAGED over all the beads for each probe. So, if someone REALLY > wanted to get into the deeper level of analysis (bead-level), they > would have to upload some additional files (which I've never dealt > with). Thus, I recommend not dealing with that layer just yet. Ok, this sounds almost like the same setup as for Affymetrix files. We have made a special solution which stores the data in the original files and not in the database. In BASE 2.5 we hope to create a generic solution for this. > >> I am also thinking of the possibility of using the plug-in from the >> Experiment view page if the experiment is of the 'illumina' data type. >> Then, the raw bioassays created by the import could be assigned to the >> experiment by the plug-in, saving yet another manual step. >> > > That seems cool. Would it be then easy to extend this feature to all > data types? It would not be as useful since only one data set can be imported at a time. >> Since you uploaded the files to out Trac I assume that you are not >> worried about other users seeing them. Is it ok to use some of the files >> in our regular test programs? They will not be included in the binary >> distribution, only in the source distribution and of course from direct >> subversion access. > > Yes, you can use those data in your test files. Thanks. /Nicklas ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]