[
https://issues.apache.org/jira/browse/HADOOP-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221856#comment-13221856
]
Steve Loughran commented on HADOOP-8126:
----------------------------------------
You could move this to HBase JIRA if you want
> [Coprocessors] Add hooks for bulk loading actions
> -------------------------------------------------
>
> Key: HADOOP-8126
> URL: https://issues.apache.org/jira/browse/HADOOP-8126
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Andrew Purtell
>
> The API gap for bulk HFile loading was discussed on the mailing list but it
> didn't make it into a JIRA. It also came up on HBASE-5498.
> See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers
> The salient detail:
> {quote}
> A simple and straightforward course of action is to give the CP the option of
> rewriting the submitted store file(s) before the regionserver attempts to
> validate and move them into the store. This is similar to how CPs are hooked
> into compaction: CPs hook compaction by allowing one to wrap the scanner that
> is iterating over the store files. So the wrapper gets a chance to examine
> the KeyValues being processed and also has an opportunity to modify or drop
> them.
> Similarly for incoming HFiles for bulk load, the CP could be given a scanner
> iterating over those files, if you had a RegionObserver installed. You would
> be given the option in effect to rewrite the incoming HFiles before they are
> handed over to the RegionServer for addition to the region.
> {quote}
> I think this is a reasonable approach to interface design, because the fact
> you are given a scanner highlights the bulk nature of the input. However I
> think there should be two hooks here: one that allows for a simple yes/no
> answer as to whether the bulk load should proceed; and one that allows for a
> more expensive filtering or transformation or whatever via scanner-like
> interface. Bulk loads could be potentially very large so requiring a scan
> over them always is not a good idea.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira