Hi, Andreas,

I have just checked in a fix to exclude the null values from the 
results.  In the example

select avg(c) from null_test where 1=1;

the output would look something like the following

tableSelect -- the result table (1 x 2) for "SELECT avg(c) FROM 
T-null_test WHERE 1=1"
2, 3

The results only include three rows; the row where c is null is 
excluded.  If you are fine with this way of handling null values, 
please give it a try and let us know if you encounter any problems. 
If you would rather see null values handle properly,  i.e., like to 
see the result having 4 rows, have 'select avg(c), count(*)' print out 
something like the following

2, 3
, 1

(note the null value printed on the second row)
let us know, we will put this feature request on the list of things to do.

John

PS: As usual, you should be able to download a nightsnap of the 
current code from 
<https://codeforge.lbl.gov/snapshots.php?group_id=44>.  The new 
snapshot is built around 2-3AM pacific time zone.  To verify the 
snapshot you've download has this fix, examine the top of file 
ChangeLog, this change was logged as

2010-01-11 John Wu <John.Wu at ACM.org>
        * countQuery.cpp: add calls to m_sel->getNullMask to exclude
        null values in the select clause


On 1/11/2010 4:52 AM, Andreas Streichardt wrote:
> On Wednesday 23 December 2009 21:10:00 K. John Wu wrote:
>> Hi, Andreas,
>
> Hi John,
>
>> The code for treating blank fields in CSV files as NULL values have
>> been checked into the SVN repository.  For example, the last two
>> fields of the first of the two following lines will be treated as NULL
>> values when you tell ardea.cpp that they are numerical values,
>>
>> school,     ,
>> school,     1, 130000
>>
>> For string-valued columns, there is some ambiguity as whether you
>> intend to have a blank string or it is something missing.  For the
>> moment, we treat a blank string as as the user really wanted a blank
>> string -- i.e., not NULL.
>>
>> You should be able to get the night snapshot at
>> <https://codeforge.lbl.gov/snapshots.php?group_id=44>.
>> Please let us know how it works out for you if you do have a chance to
>> try it.
>
> Thanks for the quick reply and sorry for the late answer ;) Been away the last
> weeks. I just gave it a try and i wonder what the intended result should be.
>
> This is my testcase:
>
> hans,1,2,3,,5
> peter,1,,3,4,5
> maus,1,2,3,,5
> ,1,2,3,4,5
>
> This is how i loaded the data with ardea:
>
> ardea -d null_test -m "a:text,b:int,c:int,d:int,e:int,f:int" -t nulls.csv
>
> ls  null_test/
> a  b  c  c.msk  d  e  e.msk  f  -part.txt
>
> So that worked (i think i have read somewhere that the msk files are nullmask
> files).
>
> However. When i am doing an AVG() calculation on c this is the result:
>
> ibis -d null_test -n -q "SELECT AVG(c) WHERE 1=1"
>
> tableSelect -- select(AVG(c), 1=1) on table T-null_test produced a table with
> 1 row and 2 columns
> tableSelect -- the result table (1 x 2) for "SELECT AVG(c) FROM T-null_test
> WHERE 1=1"
> 536870913.25, 4
>
> This is the result for a COUNT:
>
> c (with counts)
> 2,      3
> 2147483647,     1
>
> Then i tried to let ibis handle ignore 2147483647 when doing the AVG:
>
> SELECT AVG(c) WHERE c!=2147483647
>
> 536870913.25, 4
>
> So it seems that ardea can now successfully load null values but there is no
> possibility to calculate against them using ibis. Is my observation correct?
>
> What impact are the NULL values supposed to have on calculations in fastbit
> currently and in the future?
>
> Unless i misunderstood something here i guess i will simply assign an
> arbitrary number to NULL values and handle them using adjusted WHERE
> statements.
>
>>
>> John
>
> Kind regards,
>
>       Andreas Streichardt
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to