Hi, Andreas, Thanks for report. You are right that the current version of FastBit does not anything other than exclude the null values from the answers. It appears that in the examples you've given, we have failed to exclude the null values. Our intent was to return only three rows (instead of four) in your example queries. This would give the correct answers in some cases, but does not conform to the SQL standard.
We should be able to fix the bug the current code, i.e., exclude the null values from the output, in the next day or so. I will let you know as soon as we have a fix checked into the SVN. To correctly handle the null values in the results will take some more time, probably a couple of weeks. If you have any suggestion on how we should handle the null values in the results, feel free to share your opinions. If you would like to proceed with your own work without waiting for this new feature, as you have indicated, you can choose a special value to designate as null for your application and let FastBit treat it as any other values. Your approach would be just as fast as we can do. By the way, with the internal data structures the way they are right now, having your own null values might be a faster approach than having a standard conforming null values in FastBit. Thanks. John On 1/11/2010 4:52 AM, Andreas Streichardt wrote: > On Wednesday 23 December 2009 21:10:00 K. John Wu wrote: >> Hi, Andreas, > > Hi John, > >> The code for treating blank fields in CSV files as NULL values have >> been checked into the SVN repository. For example, the last two >> fields of the first of the two following lines will be treated as NULL >> values when you tell ardea.cpp that they are numerical values, >> >> school, , >> school, 1, 130000 >> >> For string-valued columns, there is some ambiguity as whether you >> intend to have a blank string or it is something missing. For the >> moment, we treat a blank string as as the user really wanted a blank >> string -- i.e., not NULL. >> >> You should be able to get the night snapshot at >> <https://codeforge.lbl.gov/snapshots.php?group_id=44>. >> Please let us know how it works out for you if you do have a chance to >> try it. > > Thanks for the quick reply and sorry for the late answer ;) Been away the last > weeks. I just gave it a try and i wonder what the intended result should be. > > This is my testcase: > > hans,1,2,3,,5 > peter,1,,3,4,5 > maus,1,2,3,,5 > ,1,2,3,4,5 > > This is how i loaded the data with ardea: > > ardea -d null_test -m "a:text,b:int,c:int,d:int,e:int,f:int" -t nulls.csv > > ls null_test/ > a b c c.msk d e e.msk f -part.txt > > So that worked (i think i have read somewhere that the msk files are nullmask > files). > > However. When i am doing an AVG() calculation on c this is the result: > > ibis -d null_test -n -q "SELECT AVG(c) WHERE 1=1" > > tableSelect -- select(AVG(c), 1=1) on table T-null_test produced a table with > 1 row and 2 columns > tableSelect -- the result table (1 x 2) for "SELECT AVG(c) FROM T-null_test > WHERE 1=1" > 536870913.25, 4 > > This is the result for a COUNT: > > c (with counts) > 2, 3 > 2147483647, 1 > > Then i tried to let ibis handle ignore 2147483647 when doing the AVG: > > SELECT AVG(c) WHERE c!=2147483647 > > 536870913.25, 4 > > So it seems that ardea can now successfully load null values but there is no > possibility to calculate against them using ibis. Is my observation correct? > > What impact are the NULL values supposed to have on calculations in fastbit > currently and in the future? > > Unless i misunderstood something here i guess i will simply assign an > arbitrary number to NULL values and handle them using adjusted WHERE > statements. > >> >> John > > Kind regards, > > Andreas Streichardt > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
