Hi, Andreas, I have just checked in a fix to exclude the null values from the results. In the example
select avg(c) from null_test where 1=1; the output would look something like the following tableSelect -- the result table (1 x 2) for "SELECT avg(c) FROM T-null_test WHERE 1=1" 2, 3 The results only include three rows; the row where c is null is excluded. If you are fine with this way of handling null values, please give it a try and let us know if you encounter any problems. If you would rather see null values handle properly, i.e., like to see the result having 4 rows, have 'select avg(c), count(*)' print out something like the following 2, 3 , 1 (note the null value printed on the second row) let us know, we will put this feature request on the list of things to do. John PS: As usual, you should be able to download a nightsnap of the current code from <https://codeforge.lbl.gov/snapshots.php?group_id=44>. The new snapshot is built around 2-3AM pacific time zone. To verify the snapshot you've download has this fix, examine the top of file ChangeLog, this change was logged as 2010-01-11 John Wu <John.Wu at ACM.org> * countQuery.cpp: add calls to m_sel->getNullMask to exclude null values in the select clause On 1/11/2010 4:52 AM, Andreas Streichardt wrote: > On Wednesday 23 December 2009 21:10:00 K. John Wu wrote: >> Hi, Andreas, > > Hi John, > >> The code for treating blank fields in CSV files as NULL values have >> been checked into the SVN repository. For example, the last two >> fields of the first of the two following lines will be treated as NULL >> values when you tell ardea.cpp that they are numerical values, >> >> school, , >> school, 1, 130000 >> >> For string-valued columns, there is some ambiguity as whether you >> intend to have a blank string or it is something missing. For the >> moment, we treat a blank string as as the user really wanted a blank >> string -- i.e., not NULL. >> >> You should be able to get the night snapshot at >> <https://codeforge.lbl.gov/snapshots.php?group_id=44>. >> Please let us know how it works out for you if you do have a chance to >> try it. > > Thanks for the quick reply and sorry for the late answer ;) Been away the last > weeks. I just gave it a try and i wonder what the intended result should be. > > This is my testcase: > > hans,1,2,3,,5 > peter,1,,3,4,5 > maus,1,2,3,,5 > ,1,2,3,4,5 > > This is how i loaded the data with ardea: > > ardea -d null_test -m "a:text,b:int,c:int,d:int,e:int,f:int" -t nulls.csv > > ls null_test/ > a b c c.msk d e e.msk f -part.txt > > So that worked (i think i have read somewhere that the msk files are nullmask > files). > > However. When i am doing an AVG() calculation on c this is the result: > > ibis -d null_test -n -q "SELECT AVG(c) WHERE 1=1" > > tableSelect -- select(AVG(c), 1=1) on table T-null_test produced a table with > 1 row and 2 columns > tableSelect -- the result table (1 x 2) for "SELECT AVG(c) FROM T-null_test > WHERE 1=1" > 536870913.25, 4 > > This is the result for a COUNT: > > c (with counts) > 2, 3 > 2147483647, 1 > > Then i tried to let ibis handle ignore 2147483647 when doing the AVG: > > SELECT AVG(c) WHERE c!=2147483647 > > 536870913.25, 4 > > So it seems that ardea can now successfully load null values but there is no > possibility to calculate against them using ibis. Is my observation correct? > > What impact are the NULL values supposed to have on calculations in fastbit > currently and in the future? > > Unless i misunderstood something here i guess i will simply assign an > arbitrary number to NULL values and handle them using adjusted WHERE > statements. > >> >> John > > Kind regards, > > Andreas Streichardt > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
