Re: [base] some base 2 bugs/features (raw data and array design import)

2007-04-24 Thread Nicklas Nordborg
   In the meantime you can try using a splitter regexp that also removes 
   white-space. Try something like \s*\t\s* instead of just \t. I have not 
   tested this but it might be enough to make it work.
 
 I guessed there would be a neat trick like this, but couldn't think of it last
 night.
 
 However I tried \s*\t\s*
 and ?\s*\t\s*? which also needs Block = \Block\ and Flags = \Flags\
 
 and they both oddly give the same error as before: 
 Error: Item not found: Reporter mismatch: The feature has reporter 'demorep2'
 whereas you have given 'demorep2 ' on line 6: 1 2 1 demoB de...
 

Ok, I checked your example data again and found that there are quotes 
around the values and the space is inside the quotes. This makes it more 
problematic since the splitter regexp also removes the quotes between 
the values, but not the first and last one one the line.

I still think it is possible to create a regexp that can do the work but 
I am afraid that it will not be very simple.

I think we need a trim whitespace option that works similar to the 
remove quotes option for the importer plugins.

   This problem is affected how the database handles strings. MySQL is 
   case-insensitive. Postgres on the other hand is case-sensitive and the 
   same problem would never have appeared. The important question is if the 
   demoblank and demoBLANK should be treated as the same reporters or not?
   
   In Postgres they are already treated as different and it would be rather 
   hard to change that. The only way is to convert all ID:s to the same 
   case before storing them in the database.
   
   In MySQL they are treated as the same and it is equally hard to change 
   that. The problem appears here because the two reporters are in the same 
   file. If there had been two different raw data files, both demoblank 
   and demoBLANK would have mapped to the same reporter. The bug in our 
   code is that when the lines are in the same file we do case-sensitive 
   comparison to check what has already been inserted. I'll add a ticket 
   for that as well.

I just wanted to mention that if we solve this problem as described 
above BASE will behave differently on Postgres and MySQL. On MySQL all 
reporters will be case-insensitive and demoblank/demoBLANK will be 
the same reporter. On Postgres they will be two different reporters.

 Time to balance the negative with some positive...  BASE2 is so much nicer to
 work with than BASE1, keep up the great work guys!!

Thanks!

/Nicklas

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject unsubscribe to
[EMAIL PROTECTED]


Re: [base] some base 2 bugs/features (raw data and array design import)

2007-04-23 Thread Nicklas Nordborg
Bob MacCallum wrote:
 Hi,
 
 I just spent the afternoon getting to know the array design and raw data
 import into BASE2 - starting with genepix format - and have come across a few
 things.
 
 I'm using BASE 2.2.2 (build #3172; schema #30).  I have looked through the
 fixes for 2.2.3 and decided not to upgrade - otherwise I'll just spend my
 whole life upgrading BASE... ;-)
 
 1. In the view files page, the type menu has a blank entry for 'raw data'
although this still seems to work.  This might be fixed in 2.2.3
see http://base.thep.lu.se/ticket/559 which looks related.

Yes, I think this is the same thing.


 2. I think there's some inconsistent handling of trailing spaces in the
reporter ID column of a genepix .gpr file.  For example I can import
reporters, and create an array design from the file pasted below, but I
can't then import the raw data!
 
 (the following is just 8 lines long - if the long lines get mangled, I'll send
 a copy by mail on request)
 
 ATF   1.0
 2743
 Type=GenePix Results 1.4
 Block   ColumnRow   Name  IDX Y Dia.  
 F635 Median   F635 Mean F635 SD   B635 Median   B635 Mean   
   B635 SD   %  B635+1SD  %  B635+2SD  F635 % Sat.   F532 
 Median   F532 Mean F532 SD   B532 Median   B532 Mean 
 B532 SD   %  B532+1SD  %  B532+2SD  F532 % Sat.   Ratio of 
 Medians  Ratio of MeansMedian of Ratios  Mean of 
 RatiosRatios SD Rgn Ratio Rgn R²F Pixels 
  B Pixels  Sum of MediansSum of Means  Log Ratio 
 F635 Median - B635F532 Median - B532F635 Mean - B635  F532 
 Mean - B532  Flags
 1 1   1   demoA demorep1  16905730110 183 
 181 42  59  62  25  100 98  0   276 270   
   48  64  65  13  100 100 0   0.585   0.592   
 0.570   0.576   1.357   0.591   0.782   80  621 336 328 
 -0.774  124 212 122 206 0
 1 2   1   demoB demorep2  19105730120 114 
 137 175 57  61  37  71  21  0   346 341   
   80  63  65  35  96  95  0   0.201   0.288   
 0.192   0.209   2.379   0.398   0.094   120 716 340 358 
 -2.312  57  283 80  278 0
 1 3   1   demoC demorep3  21105740110 145 
 148 43  63  68  30  92  68  0   208 214   
   48  69  74  43  98  93  0   0.590   0.586   
 0.599   0.541   1.987   0.504   0.582   80  566 221 230 
 -0.761  82  139 85  145 0
 1 4   1   demoD demorep4  23005730110 185 
 187 51  59  63  23  100 96  0   298 294   
   57  64  67  24  100 98  0   0.538   0.557   
 0.526   0.538   1.599   0.549   0.730   80  590 360 358 
 -0.893  126 234 128 230 0
 
 
 the stacktrace from the raw data import is:
 
 net.sf.basedb.core.BaseException: Item not found: Reporter mismatch: The 
 feature has reporter 'demorep2' whereas you have given 'demorep2 ' on line 6: 
 1 2 1 demoB de...
 at 
 net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:592)
 at 
 net.sf.basedb.plugins.AbstractFlatFileImporter.run(AbstractFlatFileImporter.java:442)
 at 
 net.sf.basedb.core.PluginExecutionRequest.invoke(PluginExecutionRequest.java:88)
 at 
 net.sf.basedb.core.InternalJobQueue$JobRunner.run(InternalJobQueue.java:420)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: net.sf.basedb.core.ItemNotFoundException: Item not found: Reporter 
 mismatch: The feature has reporter 'demorep2' whereas you have given 
 'demorep2 '
 at net.sf.basedb.core.RawDataBatcher.doInsert(RawDataBatcher.java:390)
 at net.sf.basedb.core.RawDataBatcher.insert(RawDataBatcher.java:343)
 at 
 net.sf.basedb.plugins.RawDataFlatFileImporter.handleData(RawDataFlatFileImporter.java:544)
 at 
 net.sf.basedb.plugins.AbstractFlatFileImporter.doImport(AbstractFlatFileImporter.java:570)
 ... 4 more
 
 
 I think BASE1 was more tolerant.

Leading and trailing blanks are trimmed from more or less all values 
before they are inserted in the database and that explains why you get 
demorep2 instead of demorep2 . I guess we never though of doing the 
same when checking if a reporter (or something else with a unique value) 
exists in the database or not. I think there are several other places 
affected by the same thing. I'll add this as a bug in our trac database. 
In the meantime you can try using a splitter regexp that also removes 
white-space. Try something like \s*\t\s* instead of just \t. I have not 
tested this but it might be enough to make it work.

 
 3. case sensitivity in the reporter ID (external id) column
 
   I get Error: Duplicate entry 'demoBLANK' for key 2