Hello Li, Thanks a lot for the heads up.
Indeed, I was trying to apply EQ and IN statements on columns belonging to a derived dimension. I did not get that such columns are not included in the rowkey generation, hence my need for a secondary index on HBase. I have now added the columns involved in filters as normal dimensions, and I get sub-second queries with EQ and IN statements as expected. As a side note, I was a little misled by the "Auto Generator" wizard in the cube creation UI (step 3): the wizard adds all the selected columns from a lookup table as a derived dimension by default. Nevertheless, as you mentioned above, if a column must be used in EQ and IN statements later on, it should not be included in the derived dimension, and put in a normal dimension instead (to include it in the rowkey). Maybe an additional info panel that explains such behaviour could be useful. Also, I think the UI should better inform that the order of columns in the rowkey is important performance-wise (although you wrote it in the slide deck). I have also noticed that someone else have raised some clarification about the definition of hierarchies. https://issues.apache.org/jira/browse/KYLIN-887 Thanks, luca On Sat, Jul 11, 2015 at 2:12 AM, Li Yang <[email protected]> wrote: > Hi Luca, could you give an example of your cube definition and query? I'm > not 100% sure I understand the problem. > > > Such statements include EQ or IN operators and are not defined on > rowkeys. > If a column is not on rowkey, then you defined it as derived? From a cube > design point of view, such columns should be on rowkey for best > performance. And better to be the first column of rowkey, because then the > EQ / IN condition will cut down the scan range significantly. > > Cheers > Yang > > On Tue, Jul 7, 2015 at 4:28 AM, Julian Hyde <[email protected]> wrote: > > > Does your use case look like > > > > … > > WHERE (CASE > > WHEN condition1 THEN constant1 > > WHEN condition2 THEN constant2 … > > END ) = constant1 > > > > If so, https://issues.apache.org/jira/browse/CALCITE-727 may help. (The > > fix is not in current Kylin, but maybe it could be in within a month or > so.) > > > > Julian > > > > On Jul 6, 2015, at 2:49 AM, Luca Costabello <[email protected]> > > wrote: > > > > > Hello all, > > > > > > In my adoption scenario (~50 M records) I must execute queries with > WHEN > > > statements. Such statements include EQ or IN operators and are not > > defined > > > on rowkeys. > > > > > > Unfortunately, the lack of secondary indexes in HBase determines > response > > > times that go well above 1 minute. While this can be acceptable under > > many > > > circumstances, it severely degrades the performance of the system I > have > > > built over Kylin (it is my understanding that each EQ condition or IN > > > element determines a HBase full scan). > > > > > > I would like to know if someone have come up with a solution or > > workaround. > > > I think you guys already apply some client request filters [1] to some > > > extent. > > > Has some of you tried to integrate Kylin HBase client code with hindex > > [2]? > > > I wonder if the coprocessor-based approach adopted by hindex might be > > > effective - even though hindex does not come as a standalone jar, so > > > deploying the hindex HBase fork is necessary (I am not aware of how > > hindex > > > is reliable and the latest commit is 6 month old). Besides, some change > > to > > > Kylin HBase client code would be required (when creating cube HTables). > > > I have also had a quick look at Phoenix [3], which comes with secondary > > > indexes support, but I wonder if it makes sense to integrate that with > > > Kylin (in this case I think Kylin HBase client code should be heavily > > > modified to switch to Phoenix APIs.) > > > > > > Long story short, I wonder if someone could give me a heads up and > point > > me > > > in the right direction. > > > > > > > > > Cheers, > > > luca > > > > > > [1] http://hbase.apache.org/book.html#client.filter > > > [2] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98 > > > [3] https://phoenix.apache.org/secondary_indexing.html > > > > >
