[ 
https://issues.apache.org/jira/browse/HCATALOG-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146394#comment-13146394
 ] 

Julien Le Dem commented on HCATALOG-137:
----------------------------------------

I noticed that HCat connects to the Metastore from the slaves in the job 
cleanup step to register the newly created partition. So that would make the 
thrift library necessary.
However we still need a correct way of using HCatalog with Pig without 
registering all the dependencies. I made it work by making a fat hcatalog jar 
containing its dependencies but there should be a better way. HCat could 
automatically register its dependencies for example.
                
> hcatalog.jar is independent of libraries like metastore and thrift when it's 
> running on the slaves side of a cluster
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HCATALOG-137
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-137
>             Project: HCatalog
>          Issue Type: Improvement
>          Components: pig
>    Affects Versions: 0.2
>            Reporter: Min Zhou
>            Priority: Critical
>             Fix For: 0.3
>
>         Attachments: HCAT-137-v1.diff
>
>
> At present, if we run a pig script like below w/o register hive-metastore.jar 
> or libthrift.jar.
> {noformat}
> A = LOAD 'orders' USING org.apache.hcatalog.pig.HCatLoader(); 
> B = FOREACH A GENERATE o_custkey;
> C = LIMIT B 10;
> DUMP C; 
> {noformat}
> Each mapper would throw exceptions like below 
> {noformat}
> java.lang.RuntimeException: could not instantiate 
> 'org.apache.hcatalog.pig.HCatLoader' with arguments 'null' 
> at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:504) 
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:154)
>  
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:106)
>  
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:594) 
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:308) at 
> org.apache.hadoop.mapred.Child.main(Child.java:156)
>  Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/metastore/api/NoSuchObjectException 
> at org.apache.hcatalog.pig.HCatLoader.(HCatLoader.java:55) 
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>  
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>  
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 
> at java.lang.Class.newInstance0(Class.java:355) 
> at java.lang.Class.newInstance(Class.java:308) 
> at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:474) 
> ... 5 more 
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException 
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) 
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) 
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248) 
> ... 13 more
> {noformat}
> Theoretically, hive metastore and thrift are needed by HCatLoader/HCatStorer 
> when it's running on the client side, However, they actually have no use for 
> slave side. The scripts people register those jars are unnecessary. Those 
> jars shouldn't be distributed to any nodes where MR tasks will run on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to