I have a sparsely populated table with n dimensions, and k metrics. Like this:
d1, d2, ..., dn, m1, m2, ..., mk Queries involve conditions on the dimensions, and retrieve a small subset of the dimensions and metrics. The problem is that I usually have to specify all n dimensions in the where clause. For example, lets say d1 is date, and I want to get m1 where d1 between 2012-03-31 and 2012-05-30. I need to do something like: select d1, m1 where (d1 between 2012-03-31 and 2012-05-30) and (d2 IS NULL) and (d3 IS NULL) ... and (dn IS NULL) I understand that this kind of query is slow in a columnar database, because all the dimensions columns must be examined. Is there anyway in FastBit to make queries like this fast? Essentially, I want to first get all the rows where ” (d2 IS NULL) and (d3 IS NULL) ... and (dn IS NULL)” and then run this simple query on the result: “select d1, m1 where (d1 between 2012-03-31 and 2012-05-30)”. I've thought about splitting up the table into dense pieces - but the problem is that there are "n choose 5" possible subsets of dimensions (where 5 dimensions are data and the rest are null). Since n is around 18, that means something like 8568 tables. Thanks in advance for any suggestions.
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
