[
https://issues.apache.org/jira/browse/HCATALOG-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421884#comment-13421884
]
Jaeho Shin edited comment on HCATALOG-137 at 7/25/12 12:02 AM:
---------------------------------------------------------------
I'm using HCatalog from Giraph to read/write Hive tables, and my solution to
this problem was to add all the jars in the HADOOP_CLASSPATH env variable to
the "tmpjars" Job Configuration. This is basically the same as passing all
jars used by client also to the workers over hadoop -libjars option, but done
automatically when setting up the Job. This uploads a lot of jars (many
unnecessary) but works well in almost any case.
was (Author: netj):
I'm using HCatalog from Giraph to read/write Hive tables, and my solution
to this problem was to add all the jars in the HADOOP_CLASSPATH to the
"tmpjars" Job Configuration. This is basically the same as passing all jars
used by client also to the workers over hadoop -libjars option, but done
automatically when setting up the Job. This uploads a lot of jars (many
unnecessary) but works well in almost any case.
> hcatalog.jar is independent of libraries like metastore and thrift when it's
> running on the slaves side of a cluster
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HCATALOG-137
> URL: https://issues.apache.org/jira/browse/HCATALOG-137
> Project: HCatalog
> Issue Type: Improvement
> Components: pig
> Affects Versions: 0.2
> Reporter: Min Zhou
> Priority: Critical
> Attachments: HCAT-137-v1.diff
>
>
> At present, if we run a pig script like below w/o register hive-metastore.jar
> or libthrift.jar.
> {noformat}
> A = LOAD 'orders' USING org.apache.hcatalog.pig.HCatLoader();
> B = FOREACH A GENERATE o_custkey;
> C = LIMIT B 10;
> DUMP C;
> {noformat}
> Each mapper would throw exceptions like below
> {noformat}
> java.lang.RuntimeException: could not instantiate
> 'org.apache.hcatalog.pig.HCatLoader' with arguments 'null'
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:504)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:154)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:106)
>
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:594)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:308) at
> org.apache.hadoop.mapred.Child.main(Child.java:156)
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/metastore/api/NoSuchObjectException
> at org.apache.hcatalog.pig.HCatLoader.(HCatLoader.java:55)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at java.lang.Class.newInstance0(Class.java:355)
> at java.lang.Class.newInstance(Class.java:308)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:474)
> ... 5 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> ... 13 more
> {noformat}
> Theoretically, hive metastore and thrift are needed by HCatLoader/HCatStorer
> when it's running on the client side, However, they actually have no use for
> slave side. The scripts people register those jars are unnecessary. Those
> jars shouldn't be distributed to any nodes where MR tasks will run on.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira