Hi, Michael, Sorry for the delayed response to your questions and suggestions. It appears that I have to split the tests in ibis::util::readInt and ibis::util::readUInt the same way as it is in ibis::util::readDouble. This change should address your test case correctly. The updated source code is checked in as SVN revision 601. Please give it a try when you get the chance.
Thanks. John On 11/7/12 4:20 PM, Michael Beauregard wrote: > Hey John, > > I had a quick look at this problem and found a trivial fix for it. > Please have a look at the attached patch. > > Thanks! > > Michael > > > On Tue, Nov 6, 2012 at 4:27 PM, Michael Beauregard > <[email protected] <mailto:[email protected]>> wrote: > > Thankfully I'm able to avoid that problematic situation > altogether. However, I have found what appears to be an issue with > loading data from csv that has NULL values. Here is an example: > > in "nulls.csv": > > 1,1,1,1 > 2,2,,2 > 3,,3,3 > ,4,4,4 > > then load the data with ardea: > > ardea -d tmp/nulls_test -m a:int,b:int,c:int,d:int -t nulls.csv -v 1 > > ...yields the following surprising output: > > -- begin printing table in tmp/nulls_test -- > Table (on disk) T-nulls_test (tmp/nulls_test) consists of 1 > partition with 4 columns and 4 rows > aINT > bINT > cINT > dINT > 1, 1, 1, 1 > 2, 2, 2147483647 <tel:2147483647>, 2147483647 <tel:2147483647> > 3, 2147483647 <tel:2147483647>, 2147483647 <tel:2147483647>, > 2147483647 <tel:2147483647> > 2147483647 <tel:2147483647>, 2147483647 <tel:2147483647>, > 2147483647 <tel:2147483647>, 2147483647 <tel:2147483647> > -- end printing table in tmp/nulls_test -- > > Notice that once ardea hits a NULL value, all subsequent values in > that row become NULL. I'm currently using svn rev 582. > > Thanks, > > Michael > > > On Fri, Nov 2, 2012 at 8:56 PM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > > Hi, Michael, > > I understand your request, however, the current implementation in > FastBit deals with group by operations by assuming all values > given to > them are valid (not NULL). Clearly, this is a very limited > option, > however, it covers a good number of use cases. > > It seems to me that I will need quite a bit of time to add the > support > for NULL values in group-by operations. If you have a quick > way to > get this done, please go ahead with modification or let me > know how to > do it.. > > John > > > On 11/1/12 4:25 PM, Michael Beauregard wrote: > > Thanks for the quick response. > > > > I was afraid I you were going to suggest that. I think I over > > simplified the query making it a poor example of what I > intend to do. > > I need something more like: > > > > SELECT a, b, AVG(c), AVG(d), AVG(e), AVG(f), AVG(g), AVG(h), > AVG(i), > > AVG(j) WHERE a NOT IN (1, 2, 3) AND b NOT IN (3, 4, 5) AND > <some other > > constraints> > > > > and columns 'c' through 'j' can have NULL values at any point. > > > > Splitting that into 8 queries: > > > > SELECT a, b, AVG(c) WHERE c NOT NULL AND ... > > SELECT a, b, AVG(d) WHERE d NOT NULL AND ... > > ... > > > > means that I'd have to somehow combine those results after > the fact > > and I'd like to avoid that if possible. > > > > Could I perform the main query without specifying 'c' > through 'j' > > aggregations, then using the api to compute each average on the > > resulting table so that the non-aggregation part of the > query doesn't > > have to be repeated? Like: > > > > SELECT a, b WHERE a NOT IN (1, 2, 3) AND b NOT IN (3, 4, 5) > AND <some > > other constraints> > > > > Then use the resulting table instance to compute > aggregations AVG('c') > > through AVG('j'). > > > > Or any other suggestions? > > > > Michael > > > > On Thu, Nov 1, 2012 at 4:08 PM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > >> You will have to do AVG(a) and AVG(b) separately as follows > >> > >> SELECT AVG(a) WHERE a not null > >> SELECT AVG(b) where b not null > >> > >> In > >> > >> SELECT AVG(a), AVG(b) WHERE a not null > >> > >> there is an implicit 'b not null' also applied, which will > give you > >> two rows instead of three as in the original example you give. > >> > >> John > >> > >> > >> On 11/1/12 3:57 PM, Michael Beauregard wrote: > >>> I see. I've noticed in past conversations that people > generally deal > >>> with this by using a magic value instead of NULLs. Which > works fine in > >>> simple cases like: > >>> > >>> SELECT AVG(a) WHERE a != <magic> > >>> > >>> However, I am in a situation where I need the query to > return multiple > >>> aggregations such as: > >>> > >>> SELECT AVG(a), AVG(b) WHERE 1 = 1 > >>> > >>> and I would like AVG(a) to average all non-magic values of > 'a' and > >>> AVG(b) to average all non-magic values of 'b'. Any > suggestions on now > >>> to go about doing this? > >>> > >>> Thanks, > >>> > >>> Michael > >>> > >>> On Thu, Nov 1, 2012 at 3:48 PM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > >>>> Hi, Michael, > >>>> > >>>> The NULL values in CSV fiels are not ignored by ardea, > but by the > >>>> query processing code. Basically, FastBit query results > can not > >>>> contain NULLs, so 'SELECT a WHERE 1 = 1' will return > three rows while > >>>> 'SELECT b WHERE 1 = 1' will only return two. > >>>> > >>>> John > >>>> > >>>> > >>>> On 11/1/12 1:55 PM, Michael Beauregard wrote: > >>>>> Hey John, > >>>>> > >>>>> I'm a bit confused as to how NULLs in csv files are > treated by ardea. > >>>>> Here is an experiment I just did: > >>>>> > >>>>> in "nulls.csv": > >>>>> > >>>>> 10,1 > >>>>> 20, > >>>>> 30,3 > >>>>> > >>>>> then load the data with ardea: > >>>>> > >>>>> ardea -d tmp/nulls_test -m a:int,b:int -t nulls.csv > >>>>> > >>>>> then query with ibis: > >>>>> > >>>>> ibis.sh -d tmp/nulls_test -q 'SELECT a, b WHERE 1 = 1' > -o out.txt > >>>>> > >>>>> cat out.txt > >>>>> 10, 1 > >>>>> 30, 3 > >>>>> > >>>>> For some reason the csv row with a NULL in it is ignored > by ardea. > >>>>> This is unexpected for me, but I'm wondering if this is > working as > >>>>> designed or not. If this is expected behaviour, then I'm > wondering how > >>>>> I can specify NULL values when loading data. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Michael > >>>>> _______________________________________________ > >>>>> FastBit-users mailing list > >>>>> [email protected] > <mailto:[email protected]> > >>>>> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > >>>>> > > > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
