Hi, Mark,

The queries on the dimensions are fine because they will require only
the NULL mask for each dimension.  This is generally not too bad.
What you are saying about "bad for columnar database" is actually to
have many columns in the select clause.  This is what I remember from
this comparison <http://lbl.gov/%7Ekwu/ps/LBNL-62756.html>.

John


On 6/13/13 1:26 PM, Mark Hansen wrote:
> I have a sparsely populated table with n dimensions, and k metrics.
>  Like this:
> 
> d1, d2, ..., dn, m1, m2, ..., mk
> 
> Queries involve conditions on the dimensions, and retrieve a small
> subset of the dimensions and metrics.
> 
> The problem is that I usually have to specify all n dimensions in the
> where clause.  For example, lets say d1 is date, and I want to get m1
> where d1 between 2012-03-31 and 2012-05-30.  I need to do something like:
> 
> select d1, m1 where (d1 between 2012-03-31 and 2012-05-30) and (d2 IS
> NULL) and (d3 IS NULL) ... and (dn IS NULL)
> 
> I understand that this kind of query is slow in a columnar database,
> because all the dimensions columns must be examined.
> 
> Is there anyway in FastBit to make queries like this fast?
>  Essentially, I want to first get all the rows where ” (d2 IS NULL)
> and (d3 IS NULL) ... and (dn IS NULL)” and then run this simple query
> on the result:  “select d1, m1 where (d1 between 2012-03-31 and
> 2012-05-30)”.
> 
> I've thought about splitting up the table into dense pieces - but the
> problem is that there are "n choose 5" possible subsets of dimensions
> (where 5 dimensions are data and the rest are null).  Since n is
> around 18, that means something like 8568 tables.
> 
> Thanks in advance for any suggestions.
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to