[ 
https://issues.apache.org/jira/browse/GOBBLIN-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari updated GOBBLIN-156:
------------------------------------
    Component/s: gobblin-kafka

> Gobblin not working with KafkaSource and mapreduce
> --------------------------------------------------
>
>                 Key: GOBBLIN-156
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-156
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-kafka
>            Reporter: Abhishek Tiwari
>              Labels: Bug:LaunchIssue, Source:Kafka
>
> Hi, 
> I'm trying to launch gobblin-mapreduce.sh on my job config, that is almost 
> copy/paste from your wiki 
> https://github.com/linkedin/gobblin/wiki/Kafka-HDFS-Ingestion
> I'm launching gobblin with command:
> ```
> bin/gobblin-mapreduce.sh  --conf jobs/dump-kafka.properties --workdir work/
> ```
> But the job fails with the following repeated error in all mappers:
> ```
> java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
>     at 
> gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.createFetchRequest(KafkaWrapper.java:401)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.fetchNextMessageBuffer(KafkaWrapper.java:333)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaWrapper.fetchNextMessageBuffer(KafkaWrapper.java:136)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaExtractor.fetchNextMessageBuffer(KafkaExtractor.java:239)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaExtractor.readRecordImpl(KafkaExtractor.java:125)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractorBase.readRecord(InstrumentedExtractorBase.java:121)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractor.readRecord(InstrumentedExtractor.java:34)
>     at 
> gobblin.runtime.LimitingExtractorDecorator.readRecord(LimitingExtractorDecorator.java:69)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecordImpl(InstrumentedExtractorDecorator.java:64)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecord(InstrumentedExtractorDecorator.java:57)
>     at gobblin.runtime.Task.run(Task.java:169)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 14 more
> ```
> It seems that gobblin does not include kafka (and other) jars in the 
> mapreduce tasks's classpath. 
> I also tried to include all the jars in lib/ directory to libjars with 
> command:
> ```
> bin/gobblin-mapreduce.sh  --conf jobs/dump-kafka.properties --workdir work/ 
> --jars `ls lib/* | tr \n ,` 
> ```
> But this time, I get error of clashing guava libraries:
> ```
> Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>         at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>         at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
>         ... 7 more
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
>         at gobblin.configuration.SourceState.<clinit>(SourceState.java:54)
>         at 
> gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.<init>(MRJobLauncher.java:554)
>         ... 12 more
> ```
> I have hadoop 2.4.0, which uses guava 11.0.2, while the one in lib/ is 
> guava-15.0. 
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/386 
> *Github Reporter* : *kzarzycki-advertine* 
> *Github Created At* : 2015-10-15T07:29:37Z 
> *Github Updated At* : 2016-03-10T00:36:08Z 
> h3. Comments 
> ----
> *kzarzycki* wrote on 2015-10-17T06:41:39Z : Hey, anyone has comments on this 
> ticket? I'll be grateful for your help with this, Thank you!
> Krzysztof
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-148891116 
> ----
> *zliu41* wrote on 2015-10-19T19:05:02Z : Hi @kzarzycki  seems the jars in 
> `lib` were somehow not correctly added to the hadoop classpath. I couldn't 
> repeat your errors (if you run `gobblin-mapreduce.sh` from the parent dir of 
> `lib` it should automatically work), so I can only make some guesses. In 
> `gobblin-mapreduce.sh` can you replace the line 
> `export HADOOP_CLASSPATH=$GOBBLIN_DEP_JARS:$HADOOP_CLASSPATH`
> with one of the following:
> ```
> export HADOOP_CLASSPATH=lib:$HADOOP_CLASSPATH
> export HADOOP_CLASSPATH=lib
> export HADOOP_CLASSPATH=.:$HADOOP_CLASSPATH
> export HADOOP_CLASSPATH=.
> ```
> Then run `gobblin-mapreduce.sh` with or without option `--jars [path-to-lib]`.
> Not sure which combination is correct so you can try these options.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-149314749 
> ----
> *rsimiciuc* wrote on 2015-11-02T15:09:58Z : I have the same problem. Any 
> solution?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-153047233 
> ----
> *klyr* wrote on 2015-11-17T09:30:12Z : Hi @kzarzycki-advertine,
> I had the same problem and struggled a while to fix it.
> In my case it was a problem with the hive-exec library embedding the (not 
> shaded) guava library. It took precedence over the newer guava library.
> Here is the related JIRA issue: 
> https://issues.apache.org/jira/browse/HIVE-5733
> A quick fix is to remove `hive-exec-0.13.1.jar` or not including it in the 
> `--jars` option.
> Upgrading to hive version > 1.2.0 may also work.
> I hope it will help.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-157318430 
> ----
> *gilmichlin* wrote on 2015-11-18T16:41:23Z : I can confirm it's I can 
> reproduce on HDP 2.3.0
>  ./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -PhiveVersion=1.2.1
> upgrade to hive 1.2.1 did not work for me 
> just used:
>  --jars  `ls lib/* | grep -v hive | tr \n ,` 
> and it was working 
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-157771981 
> ----
> *zliu41* wrote on 2015-11-18T16:58:19Z : @klyr @gilmichlin thanks for 
> posting! I'll see if updating the hive version works.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-157778024 
> ----
> *zliu41* wrote on 2015-11-18T22:28:14Z : I've updated the hive version to 
> 1.2.1. #466 
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-157885351 
> ----
> *gilmichlin* wrote on 2015-11-18T22:37:41Z : 1.2.1 did not work for me with 
> HDP 2.3.0 only the 
> --jars ls lib/\* | grep -v hive | tr \n ,
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-157887533 
> ----
> *zliu41* wrote on 2015-11-19T18:22:04Z : @gilmichlin is it still because of 
> the Guava dependency? Based on HIVE-5733 it shouldn't be a problem with Hive 
> 1.2.0 or later.
> If so, is there any hive version that works for you?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-158145837 
> ----
> *gilmichlin* wrote on 2015-11-20T18:46:14Z : I am going to check it out in 
> the weekend
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-158488974 
> ----
> *gilmichlin* wrote on 2015-11-23T18:58:26Z : Still Guava
> you will be able to reproduce by loading HDP 2.3.X VM build with the 
> following:
> ```
> ./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -PhiveVersion=1.2.1
> ```
> running the following wikipedia example
> ```
> /bin/gobblin-mapreduce.sh  --conf /opt/gobblin/job/wikipedia.pull --workdir 
> /user/root/gobblin/ --jars  `ls lib/*  | tr \n ,`
> ```
> will give the following error
> ```
> 2015-11-23 18:50:15,857 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: 
> java.lang.reflect.InvocationTargetException
>     at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>     at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:132)
>     ... 7 more
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
>     at gobblin.configuration.SourceState.<clinit>(SourceState.java:54)
>     at 
> gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.<init>(MRJobLauncher.java:525)
>     ... 12 more
> ```
> listing Hive jars
> ```
> ls -l lib/ | grep hive
> -rw-r--r-- 1 root root    47713 2015-11-18 16:17 hive-ant-1.2.1.jar
> -rw-r--r-- 1 root root   292289 2015-11-18 16:17 hive-common-1.2.1.jar
> -rw-r--r-- 1 root root 20599029 2015-11-18 16:17 hive-exec-1.2.1.jar
> -rw-r--r-- 1 root root   100580 2015-11-18 16:17 hive-jdbc-1.2.1.jar
> -rw-r--r-- 1 root root  5505100 2015-11-18 16:17 hive-metastore-1.2.1.jar
> -rw-r--r-- 1 root root   916706 2015-11-18 16:17 hive-serde-1.2.1.jar
> -rw-r--r-- 1 root root  1878543 2015-11-18 16:17 hive-service-1.2.1.jar
> -rw-r--r-- 1 root root    32390 2015-11-18 16:17 hive-shims-0.20S-1.2.1.jar
> -rw-r--r-- 1 root root    60070 2015-11-18 16:17 hive-shims-0.23-1.2.1.jar
> -rw-r--r-- 1 root root     8949 2015-11-18 16:17 hive-shims-1.2.1.jar
> -rw-r--r-- 1 root root   108914 2015-11-18 16:17 hive-shims-common-1.2.1.jar
> -rw-r--r-- 1 root root    13065 2015-11-18 16:17 
> hive-shims-scheduler-1.2.1.jar
> ```
> running like that would work
> ```
> ./bin/gobblin-mapreduce.sh  --conf /opt/gobblin/job/wikipedia.pull --workdir 
> /user/root/gobblin/ --jars  `ls lib/*  | grep -v hive | tr \n ,`
> ```
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-159027977 
> ----
> *rsimiciuc* wrote on 2015-11-23T19:11:31Z : I had the same problem with 
> running gobblin on CDH5, but i managed to solve
> it by shadowing guava
> On Monday, 23 November 2015, gilmichlin [email protected] wrote:
> > Still Guava
> > you will be able to reproduce by loading HDP 2.3.X VM build with the
> > following:
> > 
> > ./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -PhiveVersion=1.2.1
> > 
> > running the following wikipedia example
> > 
> > /bin/gobblin-mapreduce.sh  --conf /opt/gobblin/job/wikipedia.pull --workdir 
> > /user/root/gobblin/ --jars  `ls lib/*  | tr \n ,`
> > 
> > will give the following error
> > 
> > 2015-11-23 18:50:15,857 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> > Exception running child : java.lang.RuntimeException: 
> > java.lang.reflect.InvocationTargetException
> >     at 
> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
> >     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
> >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:415)
> >     at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> > Caused by: java.lang.reflect.InvocationTargetException
> >     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> >     at 
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> >     at 
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> >     at 
> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:132)
> >     ... 7 more
> > Caused by: java.lang.NoSuchMethodError: 
> > com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
> >     at gobblin.configuration.SourceState.<clinit>(SourceState.java:54)
> >     at 
> > gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.<init>(MRJobLauncher.java:525)
> >     ... 12 more
> > 
> > listing Hive jars
> > 
> > ls -l lib/ | grep hive
> > -rw-r--r-- 1 root root    47713 2015-11-18 16:17 hive-ant-1.2.1.jar
> > -rw-r--r-- 1 root root   292289 2015-11-18 16:17 hive-common-1.2.1.jar
> > -rw-r--r-- 1 root root 20599029 2015-11-18 16:17 hive-exec-1.2.1.jar
> > -rw-r--r-- 1 root root   100580 2015-11-18 16:17 hive-jdbc-1.2.1.jar
> > -rw-r--r-- 1 root root  5505100 2015-11-18 16:17 hive-metastore-1.2.1.jar
> > -rw-r--r-- 1 root root   916706 2015-11-18 16:17 hive-serde-1.2.1.jar
> > -rw-r--r-- 1 root root  1878543 2015-11-18 16:17 hive-service-1.2.1.jar
> > -rw-r--r-- 1 root root    32390 2015-11-18 16:17 hive-shims-0.20S-1.2.1.jar
> > -rw-r--r-- 1 root root    60070 2015-11-18 16:17 hive-shims-0.23-1.2.1.jar
> > -rw-r--r-- 1 root root     8949 2015-11-18 16:17 hive-shims-1.2.1.jar
> > -rw-r--r-- 1 root root   108914 2015-11-18 16:17 hive-shims-common-1.2.1.jar
> > -rw-r--r-- 1 root root    13065 2015-11-18 16:17 
> > hive-shims-scheduler-1.2.1.jar
> > 
> > running like that would work
> > 
> > ./bin/gobblin-mapreduce.sh  --conf /opt/gobblin/job/wikipedia.pull 
> > --workdir /user/root/gobblin/ --jars  `ls lib/*  | grep -v hive | tr \n ,`
> > 
> > —
> > Reply to this email directly or view it on GitHub
> > https://github.com/linkedin/gobblin/issues/386#issuecomment-159027977.
> ## 
> //R Mobile
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-159031634 
> ----
> *x10ba* wrote on 2015-12-05T01:32:53Z : Hi, I think my error is similar to 
> this thread, so putting it here (not sure if I need to change my properties):
> Exception in thread main java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: 
> gobblin.source.extractor.extract.kafka.kafkaSimpleSource
> Current sys:
> centos
> Invoke:
> [bin]$ ./gobblin-mapreduce.sh --conf 
> ~/gobblin/gobblin-dist/conf/gobblin-mapreduce.properties  --workdir 
> ~/gobblin/work --jars ~/gobblin/gobblin-dist/lib/gobblin-core.jar
> kafkaSimpleSource lives in the gobblin-core.jar
> thanks,
> x10ba
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-162124443 
> ----
> *qizongjun* wrote on 2016-03-09T23:24:22Z : Anyone with luck on this? I am 
> facing the same Kafka problem. I find kafka jar inside gobblin/lib there, and 
> it contains TopicAndPartition.class.
> I am using latest Gobblin code.
> I tried removing the hive-exec.jar too. It did not work for me. 
> 2016-03-09 22:35:01,845 ERROR [TaskExecutor-0] gobblin.runtime.Task: Task 
> task_kafka2hdfs_1457562888703_1 failed
> java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
>     at 
> gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.createFetchRequest(KafkaWrapper.java:401)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaWrapper$KafkaOldAPI.fetchNextMessageBuffer(KafkaWrapper.java:333)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaWrapper.fetchNextMessageBuffer(KafkaWrapper.java:136)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaExtractor.fetchNextMessageBuffer(KafkaExtractor.java:227)
>     at 
> gobblin.source.extractor.extract.kafka.KafkaExtractor.readRecordImpl(KafkaExtractor.java:123)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractorBase.readRecord(InstrumentedExtractorBase.java:121)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractor.readRecord(InstrumentedExtractor.java:34)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecordImpl(InstrumentedExtractorDecorator.java:64)
>     at 
> gobblin.instrumented.extractor.InstrumentedExtractorDecorator.readRecord(InstrumentedExtractorDecorator.java:57)
>     at gobblin.runtime.Task.run(Task.java:172)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     ... 13 more
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-194562251 
> ----
> [~stakiar] wrote on 2016-03-10T00:36:08Z : Adding add 
> `kafka_2.11-0.8.2.1.jar` to the `--jars` option when you running 
> `bin/gobblin-mapreduce.sh` fixes this
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/386#issuecomment-194589153



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to