[ https://issues.apache.org/jira/browse/CRUNCH-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615415#comment-14615415 ]
Josh Wills commented on CRUNCH-536: ----------------------------------- [~janvanbesien] sorry for the lag, 4th of July weekend over here. A couple of thoughts: 1. I think we can safely modify the Hook interface to be something that accepts the MRJob class as an argument, ala "void run(MRJob job)". I generally don't like to modify the public interfaces to the API, but as I read through the code, there's really no way to use the Hooks outside of the context of CrunchControlledJob, so I think it's okay to do and avoids the overhead of having two "hook" interfaces. 2. I think for clarity, I prefer "MRPipeline addPrepareHook(Hook hook)" and "MRPipeline addCompletionHook(Hook hook)" to adding constructor arguments. 3. We'll need some sort of "CompositeHook" class to make that work w/the existing Prepare and CompletionHook in CrunchJobHooks. We can make the guarantee that the hooks will be executed in the order that they are added, with the condititon that Crunch's built-in Hooks will always run first. What do you think? > crunch jobs fail to use hbase api of secured hbase > -------------------------------------------------- > > Key: CRUNCH-536 > URL: https://issues.apache.org/jira/browse/CRUNCH-536 > Project: Crunch > Issue Type: Bug > Reporter: Jan Van Besien > Attachments: CRUNCH-536.patch > > > When accessing a secured hbase from within a mapreduce job, it is required > that the hbase credentials are initialized on the job before it was > submitted. This can be done with TableMapReduceUtil.initCredentials(job). > In case the job is the consequence of using HBaseSourceTarget, crunch-hbase > can take care of it, see CRUNCH-535. > However, it is also possible to write DoFn's that use the HBase api directly, > without using hbase input/output format. As an example use case, consider a > job that bulk writes data to hbase by writing HFiles on HDFS which are later > to be loaded into HBase. Such a job doesn't read or write from/to hbase using > an input/output format directly, but it might still require access to other > tables in HBase, for example auxiliary tables with metadata specific to the > application. > We can of course not expect crunch-core to call initCredentials (which is > HBase specific) on all jobs, just in case, but it would be nice to be able to > register a callback on the MRPipeline which is applied to every job before it > is submitted, to cover this use case. > I will provide a patch which will help to explain what I am suggesting here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)