I have a sparsely populated table with n dimensions, and k metrics.  Like
this:

d1, d2, ..., dn, m1, m2, ..., mk

Queries involve conditions on the dimensions, and retrieve a small subset
of the dimensions and metrics.

The problem is that I usually have to specify all n dimensions in the where
clause.  For example, lets say d1 is date, and I want to get m1 where d1
between 2012-03-31 and 2012-05-30.  I need to do something like:

select d1, m1 where (d1 between 2012-03-31 and 2012-05-30) and (d2 IS NULL)
and (d3 IS NULL) ... and (dn IS NULL)

I understand that this kind of query is slow in a columnar database,
because all the dimensions columns must be examined.

Is there anyway in FastBit to make queries like this fast?  Essentially, I
want to first get all the rows where ” (d2 IS NULL) and (d3 IS NULL) ...
and (dn IS NULL)” and then run this simple query on the result:  “select
d1, m1 where (d1 between 2012-03-31 and 2012-05-30)”.

I've thought about splitting up the table into dense pieces - but the
problem is that there are "n choose 5" possible subsets of dimensions
(where 5 dimensions are data and the rest are null).  Since n is around 18,
that means something like 8568 tables.

Thanks in advance for any suggestions.
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to