I was trying to spend a little time this weekend catching up with the
current state of HBase integration for Hive. One thing that I haven't
seen mentioned is how exactly Hive scans an HBase table during a SELECT.
Does Hive have logic that allows it to intelligently scan only the
participating regions during a SELECT query that uses the rowkey? If
not, I recently wrote some code that allows a MapReduce job to
effectively select the regions based on a list of start/end rowkey
ranges. If this might be useful to the Hive integration, I could create
a Jira and take a look at trying to set up a patch.
Daniel Einspanjer
Metrics Architect
Mozilla Corporation