Hello all, In my adoption scenario (~50 M records) I must execute queries with WHEN statements. Such statements include EQ or IN operators and are not defined on rowkeys.
Unfortunately, the lack of secondary indexes in HBase determines response times that go well above 1 minute. While this can be acceptable under many circumstances, it severely degrades the performance of the system I have built over Kylin (it is my understanding that each EQ condition or IN element determines a HBase full scan). I would like to know if someone have come up with a solution or workaround. I think you guys already apply some client request filters [1] to some extent. Has some of you tried to integrate Kylin HBase client code with hindex [2]? I wonder if the coprocessor-based approach adopted by hindex might be effective - even though hindex does not come as a standalone jar, so deploying the hindex HBase fork is necessary (I am not aware of how hindex is reliable and the latest commit is 6 month old). Besides, some change to Kylin HBase client code would be required (when creating cube HTables). I have also had a quick look at Phoenix [3], which comes with secondary indexes support, but I wonder if it makes sense to integrate that with Kylin (in this case I think Kylin HBase client code should be heavily modified to switch to Phoenix APIs.) Long story short, I wonder if someone could give me a heads up and point me in the right direction. Cheers, luca [1] http://hbase.apache.org/book.html#client.filter [2] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98 [3] https://phoenix.apache.org/secondary_indexing.html
