Basically, we have 2 ways to design the hbase rowkey for inverted index:
1. "time + keyword":  
It split the index by time that can avoid hbase region merge. But one query may 
scan lots of scattered rows that is not sequential.
2. "keyword + time":  
It can guarantee the sequential scan of keyword. But it may trigger the hbase 
region merge since one keyword may be scattered in many regions.


So, we can merge these 2 solutions as this: "coarse granularity time + keyword 
+ fine granularity time". For example, "20150215 + abc + 1130". In this way, we 
use "coarse granularity time" to avoid hbase region merge and "fine granularity 
time" to guarantee the sequential scan.


User can define different "coarse granularity time" & "fine granularity time" 
for different cases. If the inverted index is only used in real-time case, we 
can define a small "coarse granularity time" (e.g. 1 day). If the indverted 
index will cover full data set, we can define a big "coarse granularity time" 
(e.g. 1 month).


Thanks
Jiang Xu

Reply via email to