Re: On improving WHEN statements performance on other columns

Li Yang Fri, 10 Jul 2015 18:13:40 -0700

Hi Luca, could you give an example of your cube definition and query? I'm
not 100% sure I understand the problem.


> Such statements include EQ or IN operators and are not defined on rowkeys.
If a column is not on rowkey, then you defined it as derived? From a cube
design point of view, such columns should be on rowkey for best
performance. And better to be the first column of rowkey, because then the
EQ / IN condition will cut down the scan range significantly.

Cheers
Yang

On Tue, Jul 7, 2015 at 4:28 AM, Julian Hyde <[email protected]> wrote:

> Does your use case look like
>
>    …
>    WHERE (CASE
>                    WHEN condition1 THEN constant1
>                    WHEN condition2 THEN constant2 …
>                    END ) = constant1
>
> If so, https://issues.apache.org/jira/browse/CALCITE-727 may help. (The
> fix is not in current Kylin, but maybe it could be in within a month or so.)
>
> Julian
>
> On Jul 6, 2015, at 2:49 AM, Luca Costabello <[email protected]>
> wrote:
>
> > Hello all,
> >
> > In my adoption scenario (~50 M records) I must execute queries with WHEN
> > statements. Such statements include EQ or IN operators and are not
> defined
> > on rowkeys.
> >
> > Unfortunately, the lack of secondary indexes in HBase determines response
> > times that go well above 1 minute. While this can be acceptable under
> many
> > circumstances, it severely degrades the performance of the system I have
> > built over Kylin (it is my understanding that each EQ condition or IN
> > element determines a HBase full scan).
> >
> > I would like to know if someone have come up with a solution or
> workaround.
> > I think you guys already apply some client request filters [1] to some
> > extent.
> > Has some of you tried to integrate Kylin HBase client code with hindex
> [2]?
> > I wonder if the coprocessor-based approach adopted by hindex might be
> > effective - even though hindex does not come as a standalone jar, so
> > deploying the hindex HBase fork is necessary (I am not aware of how
> hindex
> > is reliable and the latest commit is 6 month old). Besides, some change
> to
> > Kylin HBase client code would be required (when creating cube HTables).
> > I have also had a quick look at Phoenix [3], which comes with secondary
> > indexes support, but I wonder if it makes sense to integrate that with
> > Kylin (in this case I think Kylin HBase client code should be heavily
> > modified to switch to Phoenix APIs.)
> >
> > Long story short, I wonder if someone could give me a heads up and point
> me
> > in the right direction.
> >
> >
> > Cheers,
> > luca
> >
> > [1] http://hbase.apache.org/book.html#client.filter
> > [2] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98
> > [3] https://phoenix.apache.org/secondary_indexing.html
>
>

Re: On improving WHEN statements performance on other columns

Reply via email to