[ 
https://issues.apache.org/jira/browse/HBASE-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487587#comment-13487587
 ] 

Anoop Sam John commented on HBASE-5500:
---------------------------------------

This is already done as part of HBASE-6224 right?

                
> [Coprocessors] Add hooks for bulk loading actions
> -------------------------------------------------
>
>                 Key: HBASE-5500
>                 URL: https://issues.apache.org/jira/browse/HBASE-5500
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>
> The API gap for bulk HFile loading was discussed on the mailing list but it 
> didn't make it into a JIRA. It also came up on HBASE-5498.
> See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers
> The salient detail:
> {quote}
>     A simple and straightforward course of action is to give the CP the 
> option of rewriting the submitted store file(s) before the regionserver 
> attempts to validate and move them into the store. This is similar to how CPs 
> are hooked into compaction: CPs hook compaction by allowing one to wrap the 
> scanner that is iterating over the store files. So the wrapper gets a chance 
> to examine the KeyValues being processed and also has an opportunity to 
> modify or drop them.
>     Similarly for incoming HFiles for bulk load, the CP could be given a 
> scanner iterating over those files, if you had a RegionObserver installed. 
> You would be given the option in effect to rewrite the incoming HFiles before 
> they are handed over to the RegionServer for addition to the region.
> {quote}
> I think this is a reasonable approach to interface design, because the fact 
> you are given a scanner highlights the bulk nature of the input. However I 
> think there should be two hooks here: one that allows for a simple yes/no 
> answer as to whether the bulk load should proceed; and one that allows for a 
> more expensive filtering or transformation or whatever via scanner-like 
> interface. Bulk loads could be potentially very large so requiring a scan 
> over them always is not a good idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to