Bob MacCallum wrote:
Hi,
I just spent the afternoon getting to know the array design and raw data
import into BASE2 - starting with genepix format - and have come across a few
things.
I'm using BASE 2.2.2 (build #3172; schema #30). I have looked through the
fixes for 2.2.3 and decided not to upgrade - otherwise I'll just spend my
whole life upgrading BASE... ;-)
1. In the view files page, the type menu has a blank entry for 'raw data'
although this still seems to work. This might be fixed in 2.2.3
see http://base.thep.lu.se/ticket/559 which looks related.
Yes, I think this is the same thing.
2. I think there's some inconsistent handling of trailing spaces in the
reporter ID column of a genepix .gpr file. For example I can import
reporters, and create an array design from the file pasted below, but I
can't then import the raw data!
(the following is just 8 lines long - if the long lines get mangled, I'll send
a copy by mail on request)
ATF 1.0
2743
Type=GenePix Results 1.4
Block ColumnRow Name IDX Y Dia.
F635 Median F635 Mean F635 SD B635 Median B635 Mean
B635 SD % B635+1SD % B635+2SD F635 % Sat. F532
Median F532 Mean F532 SD B532 Median B532 Mean
B532 SD % B532+1SD % B532+2SD F532 % Sat. Ratio of
Medians Ratio of MeansMedian of Ratios Mean of
RatiosRatios SD Rgn Ratio Rgn R²F Pixels
B Pixels Sum of MediansSum of Means Log Ratio
F635 Median - B635F532 Median - B532F635 Mean - B635 F532
Mean - B532 Flags
1 1 1 demoA demorep1 16905730110 183
181 42 59 62 25 100 98 0 276 270
48 64 65 13 100 100 0 0.585 0.592
0.570 0.576 1.357 0.591 0.782 80 621 336 328
-0.774 124 212 122 206 0
1 2 1 demoB demorep2 19105730120 114
137 175 57 61 37 71 21 0 346 341
80 63 65 35 96 95 0 0.201 0.288
0.192 0.209 2.379 0.398 0.094 120 716 340 358
-2.312 57 283 80 278 0
1 3 1 demoC demorep3 21105740110 145
148 43 63 68 30 92 68 0 208 214
48 69 74 43 98 93 0 0.590 0.586
0.599 0.541 1.987 0.504 0.582 80 566 221 230
-0.761 82 139 85 145 0
1 4 1 demoD demorep4 23005730110 185
187 51 59 63 23 100 96 0 298 294
57 64 67 24 100 98 0 0.538 0.557
0.526 0.538 1.599 0.549 0.730 80 590 360 358
-0.893 126 234 128 230 0
the stacktrace from the raw data import is:
net.sf.basedb.core.BaseException: Item not found: Reporter mismatch: The
feature has reporter 'demorep2' whereas you have given 'demorep2 ' on line 6:
1 2 1 demoB de...
at
net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:592)
at
net.sf.basedb.plugins.AbstractFlatFileImporter.run(AbstractFlatFileImporter.java:442)
at
net.sf.basedb.core.PluginExecutionRequest.invoke(PluginExecutionRequest.java:88)
at
net.sf.basedb.core.InternalJobQueue$JobRunner.run(InternalJobQueue.java:420)
at java.lang.Thread.run(Thread.java:619)
Caused by: net.sf.basedb.core.ItemNotFoundException: Item not found: Reporter
mismatch: The feature has reporter 'demorep2' whereas you have given
'demorep2 '
at net.sf.basedb.core.RawDataBatcher.doInsert(RawDataBatcher.java:390)
at net.sf.basedb.core.RawDataBatcher.insert(RawDataBatcher.java:343)
at
net.sf.basedb.plugins.RawDataFlatFileImporter.handleData(RawDataFlatFileImporter.java:544)
at
net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:570)
... 4 more
I think BASE1 was more tolerant.
Leading and trailing blanks are trimmed from more or less all values
before they are inserted in the database and that explains why you get
demorep2 instead of demorep2 . I guess we never though of doing the
same when checking if a reporter (or something else with a unique value)
exists in the database or not. I think there are several other places
affected by the same thing. I'll add this as a bug in our trac database.
In the meantime you can try using a splitter regexp that also removes
white-space. Try something like \s*\t\s* instead of just \t. I have not
tested this but it might be enough to make it work.
3. case sensitivity in the reporter ID (external id) column
I get Error: Duplicate entry 'demoBLANK' for key 2