Re: On improving WHEN statements performance on other columns

Luca Costabello Fri, 31 Jul 2015 02:50:42 -0700

Hello Li,

Thanks a lot for the heads up.


Indeed, I was trying to apply EQ and IN statements on columns belonging to
a derived dimension.
I did not get that such columns are not included in the rowkey generation,
hence my need for a secondary index on HBase.

I have now added the columns involved in filters as normal dimensions, and
I get sub-second queries with EQ and IN statements as expected.

As a side note, I was a little misled by the "Auto Generator" wizard in the
cube creation UI (step 3):  the wizard adds all the selected columns from a
lookup table as a derived dimension by default. Nevertheless, as you
mentioned above, if a column must be used in EQ and IN statements later on,
it should not be included in the derived dimension, and put in a normal
dimension instead (to include it in the rowkey). Maybe an additional info
panel that explains such behaviour could be useful.

Also, I think the UI should better inform that the order of columns in the
rowkey is important performance-wise (although you wrote it in the slide
deck).

I have also noticed that someone else have raised some clarification about
the definition of hierarchies.
https://issues.apache.org/jira/browse/KYLIN-887

Thanks,

luca


On Sat, Jul 11, 2015 at 2:12 AM, Li Yang <[email protected]> wrote:

> Hi Luca, could you give an example of your cube definition and query? I'm
> not 100% sure I understand the problem.
>
> > Such statements include EQ or IN operators and are not defined on
> rowkeys.
> If a column is not on rowkey, then you defined it as derived? From a cube
> design point of view, such columns should be on rowkey for best
> performance. And better to be the first column of rowkey, because then the
> EQ / IN condition will cut down the scan range significantly.
>
> Cheers
> Yang
>
> On Tue, Jul 7, 2015 at 4:28 AM, Julian Hyde <[email protected]> wrote:
>
> > Does your use case look like
> >
> >    …
> >    WHERE (CASE
> >                    WHEN condition1 THEN constant1
> >                    WHEN condition2 THEN constant2 …
> >                    END ) = constant1
> >
> > If so, https://issues.apache.org/jira/browse/CALCITE-727 may help. (The
> > fix is not in current Kylin, but maybe it could be in within a month or
> so.)
> >
> > Julian
> >
> > On Jul 6, 2015, at 2:49 AM, Luca Costabello <[email protected]>
> > wrote:
> >
> > > Hello all,
> > >
> > > In my adoption scenario (~50 M records) I must execute queries with
> WHEN
> > > statements. Such statements include EQ or IN operators and are not
> > defined
> > > on rowkeys.
> > >
> > > Unfortunately, the lack of secondary indexes in HBase determines
> response
> > > times that go well above 1 minute. While this can be acceptable under
> > many
> > > circumstances, it severely degrades the performance of the system I
> have
> > > built over Kylin (it is my understanding that each EQ condition or IN
> > > element determines a HBase full scan).
> > >
> > > I would like to know if someone have come up with a solution or
> > workaround.
> > > I think you guys already apply some client request filters [1] to some
> > > extent.
> > > Has some of you tried to integrate Kylin HBase client code with hindex
> > [2]?
> > > I wonder if the coprocessor-based approach adopted by hindex might be
> > > effective - even though hindex does not come as a standalone jar, so
> > > deploying the hindex HBase fork is necessary (I am not aware of how
> > hindex
> > > is reliable and the latest commit is 6 month old). Besides, some change
> > to
> > > Kylin HBase client code would be required (when creating cube HTables).
> > > I have also had a quick look at Phoenix [3], which comes with secondary
> > > indexes support, but I wonder if it makes sense to integrate that with
> > > Kylin (in this case I think Kylin HBase client code should be heavily
> > > modified to switch to Phoenix APIs.)
> > >
> > > Long story short, I wonder if someone could give me a heads up and
> point
> > me
> > > in the right direction.
> > >
> > >
> > > Cheers,
> > > luca
> > >
> > > [1] http://hbase.apache.org/book.html#client.filter
> > > [2] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98
> > > [3] https://phoenix.apache.org/secondary_indexing.html
> >
> >
>

Re: On improving WHEN statements performance on other columns

Reply via email to