Re: [FastBit-users] csv with NULLs

Michael Beauregard Thu, 01 Nov 2012 16:25:34 -0700

Thanks for the quick response.

I was afraid I you were going to suggest that. I think I over
simplified the query making it a poor example of what I intend to do.
I need something more like:


SELECT a, b, AVG(c), AVG(d), AVG(e), AVG(f), AVG(g), AVG(h), AVG(i),
AVG(j) WHERE a NOT IN (1, 2, 3) AND b NOT IN (3, 4, 5) AND <some other
constraints>

and columns 'c' through 'j' can have NULL values at any point.

Splitting that into 8 queries:

SELECT a, b, AVG(c) WHERE c NOT NULL AND ...
SELECT a, b, AVG(d) WHERE d NOT NULL AND ...
...

means that I'd have to somehow combine those results after the fact
and I'd like to avoid that if possible.

Could I perform the main query without specifying 'c' through 'j'
aggregations, then using the api to compute each average on the
resulting table so that the non-aggregation part of the query doesn't
have to be repeated? Like:

SELECT a, b WHERE a NOT IN (1, 2, 3) AND b NOT IN (3, 4, 5) AND <some
other constraints>

Then use the resulting table instance to compute aggregations AVG('c')
through AVG('j').

Or any other suggestions?

Michael

On Thu, Nov 1, 2012 at 4:08 PM, K. John Wu <[email protected]> wrote:
> You will have to do AVG(a) and AVG(b) separately as follows
>
> SELECT AVG(a) WHERE a not null
> SELECT AVG(b) where b not null
>
> In
>
> SELECT AVG(a), AVG(b) WHERE a not null
>
> there is an implicit 'b not null' also applied, which will give you
> two rows instead of three as in the original example you give.
>
> John
>
>
> On 11/1/12 3:57 PM, Michael Beauregard wrote:
>> I see. I've noticed in past conversations that people generally deal
>> with this by using a magic value instead of NULLs. Which works fine in
>> simple cases like:
>>
>> SELECT AVG(a) WHERE a != <magic>
>>
>> However, I am in a situation where I need the query to return multiple
>> aggregations such as:
>>
>> SELECT AVG(a), AVG(b) WHERE 1 = 1
>>
>> and I would like AVG(a) to average all non-magic values of 'a' and
>> AVG(b) to average all non-magic values of 'b'. Any suggestions on now
>> to go about doing this?
>>
>> Thanks,
>>
>> Michael
>>
>> On Thu, Nov 1, 2012 at 3:48 PM, K. John Wu <[email protected]> wrote:
>>> Hi, Michael,
>>>
>>> The NULL values in CSV fiels are not ignored by ardea, but by the
>>> query processing code.  Basically, FastBit query results can not
>>> contain NULLs, so  'SELECT a WHERE 1 = 1' will return three rows while
>>> 'SELECT b WHERE 1 = 1' will only return two.
>>>
>>> John
>>>
>>>
>>> On 11/1/12 1:55 PM, Michael Beauregard wrote:
>>>> Hey John,
>>>>
>>>> I'm a bit confused as to how NULLs in csv files are treated by ardea.
>>>> Here is an experiment I just did:
>>>>
>>>> in "nulls.csv":
>>>>
>>>> 10,1
>>>> 20,
>>>> 30,3
>>>>
>>>> then load the data with ardea:
>>>>
>>>> ardea -d tmp/nulls_test -m a:int,b:int -t nulls.csv
>>>>
>>>> then query with ibis:
>>>>
>>>> ibis.sh -d tmp/nulls_test -q 'SELECT a, b WHERE 1 = 1' -o out.txt
>>>>
>>>> cat out.txt
>>>> 10, 1
>>>> 30, 3
>>>>
>>>> For some reason the csv row with a NULL in it is ignored by ardea.
>>>> This is unexpected for me, but I'm wondering if this is working as
>>>> designed or not. If this is expected behaviour, then I'm wondering how
>>>> I can specify NULL values when loading data.
>>>>
>>>> Thanks,
>>>>
>>>> Michael
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] csv with NULLs

Reply via email to