[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646289#comment-13646289
 ] 

Nick Dimiduk commented on PIG-3285:
-----------------------------------

bq. Not too many code paths.

Sure there are. Both Pig and HBase are replicating the behavior of 
{{ToolRunner}}'s libjars argument for including jars with a job. They do so in 
slightly different ways, but thus we have 3 different code-paths. I'd prefer 
consolidation on a single code-path.

bq. Filters out pig and hadoop classes from the list of classes so that pig and 
hadoop jar are not included.

We can add a method, something like {{addHBaseDependencyJars(Job)}} which will 
add only HBase and it's dependency jars (currently: zookeeper, protobuf, 
guava), nothing else. That way, we're not including any redundant Pig or Hadoop 
jars and HBase is managing it's own dependencies (meaning Pig won't have to 
change every time we change something). This is effectively the same doing what 
you say above, "Also ensure that you add HTable.class apart from Zookeeper, 
inputformat, input/output key/value, partitioner and combiner classes," that 
is, omitting inputformat, keys, values, partitioner, combiner. Does that sound 
like it'll accomplish what this filter intends?

bq. Find the jars for the other classes and filter out any jars already present 
in PigContext.extrajars and add only the rest to tmpjars.

How do we access the PigContext? Is it in the jobConf or some such? I'd rather 
not put Pig-specific code in the bowels of HBase mapreduce code; my preference 
is to build generic APIs that can be used across the board.

HBase APIs are designed to assist people writing raw MR jobs against HBase (ie, 
including key/value classes, input/output format classes, &c). The slightly 
different requirements of Pig and Hive need to be addressed as well.
                
> Jobs using HBaseStorage fail to ship dependency jars
> ----------------------------------------------------
>
>                 Key: PIG-3285
>                 URL: https://issues.apache.org/jira/browse/PIG-3285
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 0.11.1
>
>         Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>       at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.<clinit>(HbaseObjectWritable.java:266)
>       at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>       at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>       at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>       at $Proxy7.getProtocolVersion(Unknown Source)
>       at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to