I was trying to spend a little time this weekend catching up with the current state of HBase integration for Hive. One thing that I haven't seen mentioned is how exactly Hive scans an HBase table during a SELECT.

Does Hive have logic that allows it to intelligently scan only the participating regions during a SELECT query that uses the rowkey? If not, I recently wrote some code that allows a MapReduce job to effectively select the regions based on a list of start/end rowkey ranges. If this might be useful to the Hive integration, I could create a Jira and take a look at trying to set up a patch.

Daniel Einspanjer
Metrics Architect
Mozilla Corporation

Reply via email to