[Coprocessors] Add hooks for bulk loading actions
-------------------------------------------------
Key: HBASE-5500
URL: https://issues.apache.org/jira/browse/HBASE-5500
Project: HBase
Issue Type: Improvement
Components: coprocessors
Reporter: Andrew Purtell
The API gap for bulk HFile loading was discussed on the mailing list but it
didn't make it into a JIRA. It also came up on HBASE-5498.
See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers
The salient detail:
{quote}
A simple and straightforward course of action is to give the CP the option
of rewriting the submitted store file(s) before the regionserver attempts to
validate and move them into the store. This is similar to how CPs are hooked
into compaction: CPs hook compaction by allowing one to wrap the scanner that
is iterating over the store files. So the wrapper gets a chance to examine the
KeyValues being processed and also has an opportunity to modify or drop them.
Similarly for incoming HFiles for bulk load, the CP could be given a
scanner iterating over those files, if you had a RegionObserver installed. You
would be given the option in effect to rewrite the incoming HFiles before they
are handed over to the RegionServer for addition to the region.
{quote}
I think this is a reasonable approach to interface design, because the fact you
are given a scanner highlights the bulk nature of the input. However I think
there should be two hooks here: one that allows for a simple yes/no answer as
to whether the bulk load should proceed; and one that allows for a more
expensive filtering or transformation or whatever via scanner-like interface.
Bulk loads could be potentially very large so requiring a scan over them always
is not a good idea.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira