Abhishek Tiwari created GOBBLIN-23:
--------------------------------------
Summary: NoSuchMethodError when running on Google Dataproc
Key: GOBBLIN-23
URL: https://issues.apache.org/jira/browse/GOBBLIN-23
Project: Apache Gobblin
Issue Type: Bug
Reporter: Abhishek Tiwari
I'm trying to run Gobblin on Google Dataproc but I'm getting this
NoSuchMethodError and can't figure out how to solve.
Waiting for job output...
...
Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
Caused by: java.lang.NoSuchMethodError:
org.apache.commons.cli.Option.builder()Lorg/apache/commons/cli/Option$Builder;
at gobblin.runtime.cli.CliOption
...
This same job (contents below) runs nice on my local hadoop setup (on my
laptop) but does not on dataproc. Have someone ever attempted running Gobblin
on Dataproc?
Here's my gobblin job file:
job.name=kafka2gcs
job.group=gkafka2gcs
job.description=Gobblin job to read messages from Kafka and save as is on
GCS
job.lock.enabled=false
kafka.brokers=mykafka:9092
topic.whitelist=mytopic
bootstrap.with.offset=earliest
source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
kafka.deserializer.type=BYTE_ARRAY
extract.namespace=nskafka2gcs
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.destination.type=HDFS
mr.job.max.mappers=2
writer.output.format=txt
data.publisher.type=gobblin.publisher.BaseDataPublisher
metrics.enabled=false
fs.uri=file:///.
writer.fs.uri=${fs.uri}
mr.job.root.dir=gobblin
writer.output.dir=${mr.job.root.dir}/out
writer.staging.dir=${mr.job.root.dir}/stg
fs.gs.project.id=my-test-project
data.publisher.fs.uri=gs://my-bucket
state.store.fs.uri=${data.publisher.fs.uri}
data.publisher.final.dir=gobblin/pub
state.store.dir=gobblin/state
And these are the commands I issue for dataproc:
gcloud dataproc clusters create myspark \
--image-version 1.1 \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 10 \
--num-workers 2 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 10
gcloud dataproc jobs submit hadoop --cluster=myspark \
--class gobblin.runtime.mapreduce.CliMRJobLauncher \
--jars
/opt/gobblin-dist/lib/gobblin-runtime-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-api-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-avro-json-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-codecs-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-provider-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-data-management-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metastore-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metadata-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-utility-0.10.0.jar,/opt/gobblin-dist/lib/avro-1.8.1.jar,/opt/gobblin-dist/lib/avro-mapred-1.8.1.jar,/opt/gobblin-dist/lib/commons-lang3-3.4.jar,/opt/gobblin-dist/lib/config-1.2.1.jar,/opt/gobblin-dist/lib/data-2.6.0.jar,/opt/gobblin-dist/lib/gson-2.6.2.jar,/opt/gobblin-dist/lib/guava-15.0.jar,/opt/gobblin-dist/lib/guava-retrying-2.0.0.jar,/opt/gobblin-dist/lib/joda-time-2.9.3.jar,/opt/gobblin-dist/lib/javassist-3.18.2-GA.jar,/opt/gobblin-dist/lib/kafka_2.11-0.8.2.2.jar,/opt/gobblin-dist/lib/kafka-clients-0.8.2.2.jar,/opt/gobblin-dist/lib/metrics-core-2.2.0.jar,/opt/gobblin-dist/lib/metrics-core-3.1.0.jar,/opt/gobblin-dist/lib/metrics-graphite-3.1.0.jar,/opt/gobblin-dist/lib/scala-library-2.11.8.jar,/opt/gobblin-dist/lib/influxdb-java-2.1.jar,/opt/gobblin-dist/lib/okhttp-2.4.0.jar,/opt/gobblin-dist/lib/okio-1.4.0.jar,/opt/gobblin-dist/lib/retrofit-1.9.0.jar,/opt/gobblin-dist/lib/reflections-0.9.10.jar
\
--properties mapreduce.job.user.classpath.first=true \
-- -jobconfig gs://my-bucket/gobblin-kafka-gcs.job
I have already tried copying all gobblins lib jars inside `/usr/lib/hadoop/lib`
on all machines of the dataproc cluster, but it didn't work either.
Any ideas?
gobblin 0.10.0
hadoop 2.7.3
dataproc image 1.1
ps: I've also posted
[this](http://stackoverflow.com/questions/44036037/nosuchmethoderror-when-trying-to-run-gobblin-on-dataproc)
on Stackoverflow
*Github Url* : https://github.com/linkedin/gobblin/issues/1877
*Github Reporter* : *hgabreu*
*Github Created At* : 2017-05-17T23:02:00Z
*Github Updated At* : 2017-05-17T23:30:16Z
h3. Comments
----
[~ibuenros] wrote on 2017-05-17T23:09:43Z : Hi Henrique,
This is due to conflicting versions of Commons CLI artifacts. Probably
Dataproc has an old version in the classpath. The solutions are platform
dependent in general, but might involve forcing a particular order of the
classpath. Unfortunately, I haven't worked with Dataproc before, so I don't
know exactly how you would solve this, but if you have more questions I'm
happy to answer.
Best,
Issac
On Wed, May 17, 2017 at 4:02 PM, Henrique Abreu <[email protected]>
wrote:
> I'm trying to run Gobblin on Google Dataproc but I'm getting this
> NoSuchMethodError and can't figure out how to solve.
>
> Waiting for job output...
> ...
> Exception in thread main java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ...
> Caused by: java.lang.NoSuchMethodError:
> org.apache.commons.cli.Option.builder()Lorg/apache/commons/cli/Option$Builder;
> at gobblin.runtime.cli.CliOption
> ...
>
> This same job (contents below) runs nice on my local hadoop setup (on my
> laptop) but does not on dataproc. Have someone ever attempted running
> Gobblin on Dataproc?
>
> Here's my gobblin job file:
>
> job.name=kafka2gcs
> job.group=gkafka2gcs
> job.description=Gobblin job to read messages from Kafka and save as is on GCS
> job.lock.enabled=false
>
> kafka.brokers=mykafka:9092
> topic.whitelist=mytopic
> bootstrap.with.offset=earliest
>
> source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
> kafka.deserializer.type=BYTE_ARRAY
> extract.namespace=nskafka2gcs
>
> writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
> writer.destination.type=HDFS
> mr.job.max.mappers=2
> writer.output.format=txt
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> metrics.enabled=false
>
> fs.uri=file:///.
> writer.fs.uri=${fs.uri}
> mr.job.root.dir=gobblin
> writer.output.dir=${mr.job.root.dir}/out
> writer.staging.dir=${mr.job.root.dir}/stg
> fs.gs.project.id=my-test-project
> data.publisher.fs.uri=gs://my-bucket
> state.store.fs.uri=${data.publisher.fs.uri}
> data.publisher.final.dir=gobblin/pub
> state.store.dir=gobblin/state
>
> And these are the commands I issue for dataproc:
>
> gcloud dataproc clusters create myspark \
> --image-version 1.1 \
> --master-machine-type n1-standard-4 \
> --master-boot-disk-size 10 \
> --num-workers 2 \
> --worker-machine-type n1-standard-4 \
> --worker-boot-disk-size 10
> gcloud dataproc jobs submit hadoop --cluster=myspark \
> --class gobblin.runtime.mapreduce.CliMRJobLauncher \
> --jars
> /opt/gobblin-dist/lib/gobblin-runtime-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-api-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-avro-json-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-codecs-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-provider-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-data-management-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metastore-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metadata-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-utility-0.10.0.jar,/opt/gobblin-dist/lib/avro-1.8.1.jar,/opt/gobblin-dist/lib/avro-mapred-1.8.1.jar,/opt/gobblin-dist/lib/commons-lang3-3.4.jar,/opt/gobblin-dist/lib/config-1.2.1.jar,/opt/gobblin-dist/lib/data-2.6.0.jar,/opt/gobblin-dist/lib/gson-2.6.2.jar,/opt/gobblin-dist/lib/guava-15.0.jar,/opt/gobblin-dist/lib/guava-retrying-2.0.0.jar,/opt/gobblin-dist/lib/joda-time-2.9.3.jar,/opt/gobblin-dist/lib/javassist-3.18.2-GA.jar,/opt/gobblin-dist/lib/kafka_2.11-0.8.2.2.jar,/opt/gobblin-dist/lib/kafka-clients-0.8.2.2.jar,/opt/gobblin-dist/lib/metrics-core-2.2.0.jar,/opt/gobblin-dist/lib/metrics-core-3.1.0.jar,/opt/gobblin-dist/lib/metrics-graphite-3.1.0.jar,/opt/gobblin-dist/lib/scala-library-2.11.8.jar,/opt/gobblin-dist/lib/influxdb-java-2.1.jar,/opt/gobblin-dist/lib/okhttp-2.4.0.jar,/opt/gobblin-dist/lib/okio-1.4.0.jar,/opt/gobblin-dist/lib/retrofit-1.9.0.jar,/opt/gobblin-dist/lib/reflections-0.9.10.jar
> \
> --properties mapreduce.job.user.classpath.first=true \
> -- -jobconfig gs://my-bucket/gobblin-kafka-gcs.job
>
> I have already tried copying all gobblins lib jars inside
> /usr/lib/hadoop/lib on all machines of the dataproc cluster, but it
> didn't work either.
>
> Any ideas?
>
> gobblin 0.10.0
> hadoop 2.7.3
> dataproc image 1.1
>
> ps: I've also posted this
> <http://stackoverflow.com/questions/44036037/nosuchmethoderror-when-trying-to-run-gobblin-on-dataproc>
> on Stackoverflow
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/linkedin/gobblin/issues/1877>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABTQkNrC3sm4sbIXCBOQqDptJjMsC4zBks5r63xqgaJpZM4Neity>
> .
>
*Github Url* :
https://github.com/linkedin/gobblin/issues/1877#issuecomment-302256290
----
*hgabreu* wrote on 2017-05-17T23:29:29Z : Isn't there a way to build gobblin
with all dependencies bundled in the jar? probably also shaded to avoid such
conflicts?
BTW, thanks a lot for your quick reply. Very helpful, I'm trying to sort out
how to do this on dataproc.
*Github Url* :
https://github.com/linkedin/gobblin/issues/1877#issuecomment-302259199
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)