On improving WHEN statements performance on other columns

Luca Costabello Mon, 06 Jul 2015 02:50:36 -0700

Hello all,

In my adoption scenario (~50 M records) I must execute queries with WHEN
statements. Such statements include EQ or IN operators and are not defined
on rowkeys.


Unfortunately, the lack of secondary indexes in HBase determines response
times that go well above 1 minute. While this can be acceptable under many
circumstances, it severely degrades the performance of the system I have
built over Kylin (it is my understanding that each EQ condition or IN
element determines a HBase full scan).

I would like to know if someone have come up with a solution or workaround.
I think you guys already apply some client request filters [1] to some
extent.
Has some of you tried to integrate Kylin HBase client code with hindex [2]?
I wonder if the coprocessor-based approach adopted by hindex might be
effective - even though hindex does not come as a standalone jar, so
deploying the hindex HBase fork is necessary (I am not aware of how hindex
is reliable and the latest commit is 6 month old). Besides, some change to
Kylin HBase client code would be required (when creating cube HTables).
I have also had a quick look at Phoenix [3], which comes with secondary
indexes support, but I wonder if it makes sense to integrate that with
Kylin (in this case I think Kylin HBase client code should be heavily
modified to switch to Phoenix APIs.)

Long story short, I wonder if someone could give me a heads up and point me
in the right direction.


Cheers,
luca

[1] http://hbase.apache.org/book.html#client.filter
[2] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98
[3] https://phoenix.apache.org/secondary_indexing.html

On improving WHEN statements performance on other columns

Reply via email to