This is an very interesting idea. Actually many less general solutions (from talk to various people we met) took exactly this approach.
This feature will benefit users who have their hadoop cluster hosted in cloud service. Less cuboid means less CPU cycles, and that's less to pay. Yang On Wed, Dec 24, 2014 at 1:47 PM, hongbin ma <[email protected]> wrote: > Logically, a cube contains cuboids representing all combinations of > dimensions. Apparently, a naive cube building strategy that materializes > all cuboids will easily meet curse-of-dimension problems. Currently Kylin > leverages a strategy called "aggregation groups" to reduce the number of > cuboids need being materialized. > > However, if the query pattern is simple and fixed, the "aggregation group" > strategy is still not efficient enough. For example, suppose there're five > dimensions, namely A,B,C,D and E. The data modeler is sure that only > combinations (A,B,C), (D,E), (A,E) will be queried, so he’ll use the > aggregation group tool to optimize his cube definition. However, whatever > aggregation group he chooses, lots of useless combinations would be > materialized. > > With a new strategy called "cuboid whitelist", data modelers can guide > Kylin to only materialize the cuboids he's interested in. Depending on the > whitelist, Kylin will materialize the minimal set of cuboids to cover each > cuboid in the whitelist. To support this, the following functionalities > should be added: > > 1. Front-end/UI for specifying whitelist members, and persistent them to > cube description. > 2. Enhanced job engine scheduler that will calculate a minimal spanning > build tree based on the whitelist. > 3. (OPTIONAL) Enhanced job engine to support dynamic whitelist, trigger new > builds for lately added whitelist members. > > > > Hongbin Ma >
