Basically, we have 2 ways to design the hbase rowkey for inverted index: 1. "time + keyword": It split the index by time that can avoid hbase region merge. But one query may scan lots of scattered rows that is not sequential. 2. "keyword + time": It can guarantee the sequential scan of keyword. But it may trigger the hbase region merge since one keyword may be scattered in many regions.
So, we can merge these 2 solutions as this: "coarse granularity time + keyword + fine granularity time". For example, "20150215 + abc + 1130". In this way, we use "coarse granularity time" to avoid hbase region merge and "fine granularity time" to guarantee the sequential scan. User can define different "coarse granularity time" & "fine granularity time" for different cases. If the inverted index is only used in real-time case, we can define a small "coarse granularity time" (e.g. 1 day). If the indverted index will cover full data set, we can define a big "coarse granularity time" (e.g. 1 month). Thanks Jiang Xu
