Abhishek Tiwari created GOBBLIN-23:
--------------------------------------

             Summary: NoSuchMethodError when running on Google Dataproc
                 Key: GOBBLIN-23
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-23
             Project: Apache Gobblin
          Issue Type: Bug
            Reporter: Abhishek Tiwari


I'm trying to run Gobblin on Google Dataproc but I'm getting this 
NoSuchMethodError and can't figure out how to solve.

    Waiting for job output...
    ...
    Exception in thread main java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            ...
    Caused by: java.lang.NoSuchMethodError: 
org.apache.commons.cli.Option.builder()Lorg/apache/commons/cli/Option$Builder;
            at gobblin.runtime.cli.CliOption
            ...

This same job (contents below) runs nice on my local hadoop setup (on my 
laptop) but does not on dataproc. Have someone ever attempted running Gobblin 
on Dataproc?

Here's my gobblin job file:

    job.name=kafka2gcs
    job.group=gkafka2gcs
    job.description=Gobblin job to read messages from Kafka and save as is on 
GCS
    job.lock.enabled=false
    
    kafka.brokers=mykafka:9092
    topic.whitelist=mytopic
    bootstrap.with.offset=earliest
    
    source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
    kafka.deserializer.type=BYTE_ARRAY
    extract.namespace=nskafka2gcs
    
    writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
    writer.destination.type=HDFS
    mr.job.max.mappers=2
    writer.output.format=txt
    data.publisher.type=gobblin.publisher.BaseDataPublisher
    metrics.enabled=false
    
    fs.uri=file:///.
    writer.fs.uri=${fs.uri}
    mr.job.root.dir=gobblin
    writer.output.dir=${mr.job.root.dir}/out
    writer.staging.dir=${mr.job.root.dir}/stg
    
    fs.gs.project.id=my-test-project
    data.publisher.fs.uri=gs://my-bucket
    state.store.fs.uri=${data.publisher.fs.uri}
    data.publisher.final.dir=gobblin/pub
    state.store.dir=gobblin/state

And these are the commands I issue for dataproc:

    gcloud dataproc clusters create myspark \
      --image-version 1.1 \
      --master-machine-type n1-standard-4 \
      --master-boot-disk-size 10 \
      --num-workers 2 \
      --worker-machine-type n1-standard-4 \
      --worker-boot-disk-size 10 
    gcloud dataproc jobs submit hadoop --cluster=myspark \
      --class gobblin.runtime.mapreduce.CliMRJobLauncher \
      --jars 
/opt/gobblin-dist/lib/gobblin-runtime-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-api-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-avro-json-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-codecs-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-provider-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-data-management-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metastore-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metadata-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-utility-0.10.0.jar,/opt/gobblin-dist/lib/avro-1.8.1.jar,/opt/gobblin-dist/lib/avro-mapred-1.8.1.jar,/opt/gobblin-dist/lib/commons-lang3-3.4.jar,/opt/gobblin-dist/lib/config-1.2.1.jar,/opt/gobblin-dist/lib/data-2.6.0.jar,/opt/gobblin-dist/lib/gson-2.6.2.jar,/opt/gobblin-dist/lib/guava-15.0.jar,/opt/gobblin-dist/lib/guava-retrying-2.0.0.jar,/opt/gobblin-dist/lib/joda-time-2.9.3.jar,/opt/gobblin-dist/lib/javassist-3.18.2-GA.jar,/opt/gobblin-dist/lib/kafka_2.11-0.8.2.2.jar,/opt/gobblin-dist/lib/kafka-clients-0.8.2.2.jar,/opt/gobblin-dist/lib/metrics-core-2.2.0.jar,/opt/gobblin-dist/lib/metrics-core-3.1.0.jar,/opt/gobblin-dist/lib/metrics-graphite-3.1.0.jar,/opt/gobblin-dist/lib/scala-library-2.11.8.jar,/opt/gobblin-dist/lib/influxdb-java-2.1.jar,/opt/gobblin-dist/lib/okhttp-2.4.0.jar,/opt/gobblin-dist/lib/okio-1.4.0.jar,/opt/gobblin-dist/lib/retrofit-1.9.0.jar,/opt/gobblin-dist/lib/reflections-0.9.10.jar
 \
      --properties mapreduce.job.user.classpath.first=true \
      -- -jobconfig gs://my-bucket/gobblin-kafka-gcs.job

I have already tried copying all gobblins lib jars inside `/usr/lib/hadoop/lib` 
on all machines of the dataproc cluster, but it didn't work either.

Any ideas?

    gobblin 0.10.0
    hadoop 2.7.3
    dataproc image 1.1

ps: I've also posted 
[this](http://stackoverflow.com/questions/44036037/nosuchmethoderror-when-trying-to-run-gobblin-on-dataproc)
 on Stackoverflow
 
*Github Url* : https://github.com/linkedin/gobblin/issues/1877 
*Github Reporter* : *hgabreu* 
*Github Created At* : 2017-05-17T23:02:00Z 
*Github Updated At* : 2017-05-17T23:30:16Z 
h3. Comments 
----
[~ibuenros] wrote on 2017-05-17T23:09:43Z : Hi Henrique,

This is due to conflicting versions of Commons CLI artifacts. Probably
Dataproc has an old version in the classpath. The solutions are platform
dependent in general, but might involve forcing a particular order of the
classpath. Unfortunately, I haven't worked with Dataproc before, so I don't
know exactly how you would solve this, but if you have more questions I'm
happy to answer.

Best,
Issac

On Wed, May 17, 2017 at 4:02 PM, Henrique Abreu <[email protected]>
wrote:

> I'm trying to run Gobblin on Google Dataproc but I'm getting this
> NoSuchMethodError and can't figure out how to solve.
>
> Waiting for job output...
> ...
> Exception in thread main java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         ...
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.commons.cli.Option.builder()Lorg/apache/commons/cli/Option$Builder;
>         at gobblin.runtime.cli.CliOption
>         ...
>
> This same job (contents below) runs nice on my local hadoop setup (on my
> laptop) but does not on dataproc. Have someone ever attempted running
> Gobblin on Dataproc?
>
> Here's my gobblin job file:
>
> job.name=kafka2gcs
> job.group=gkafka2gcs
> job.description=Gobblin job to read messages from Kafka and save as is on GCS
> job.lock.enabled=false
>
> kafka.brokers=mykafka:9092
> topic.whitelist=mytopic
> bootstrap.with.offset=earliest
>
> source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
> kafka.deserializer.type=BYTE_ARRAY
> extract.namespace=nskafka2gcs
>
> writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
> writer.destination.type=HDFS
> mr.job.max.mappers=2
> writer.output.format=txt
> data.publisher.type=gobblin.publisher.BaseDataPublisher
> metrics.enabled=false
>
> fs.uri=file:///.
> writer.fs.uri=${fs.uri}
> mr.job.root.dir=gobblin
> writer.output.dir=${mr.job.root.dir}/out
> writer.staging.dir=${mr.job.root.dir}/stg
> fs.gs.project.id=my-test-project
> data.publisher.fs.uri=gs://my-bucket
> state.store.fs.uri=${data.publisher.fs.uri}
> data.publisher.final.dir=gobblin/pub
> state.store.dir=gobblin/state
>
> And these are the commands I issue for dataproc:
>
> gcloud dataproc clusters create myspark \
>   --image-version 1.1 \
>   --master-machine-type n1-standard-4 \
>   --master-boot-disk-size 10 \
>   --num-workers 2 \
>   --worker-machine-type n1-standard-4 \
>   --worker-boot-disk-size 10
> gcloud dataproc jobs submit hadoop --cluster=myspark \
>   --class gobblin.runtime.mapreduce.CliMRJobLauncher \
>   --jars 
> /opt/gobblin-dist/lib/gobblin-runtime-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-api-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-avro-json-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-codecs-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-provider-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-data-management-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metastore-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metadata-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-utility-0.10.0.jar,/opt/gobblin-dist/lib/avro-1.8.1.jar,/opt/gobblin-dist/lib/avro-mapred-1.8.1.jar,/opt/gobblin-dist/lib/commons-lang3-3.4.jar,/opt/gobblin-dist/lib/config-1.2.1.jar,/opt/gobblin-dist/lib/data-2.6.0.jar,/opt/gobblin-dist/lib/gson-2.6.2.jar,/opt/gobblin-dist/lib/guava-15.0.jar,/opt/gobblin-dist/lib/guava-retrying-2.0.0.jar,/opt/gobblin-dist/lib/joda-time-2.9.3.jar,/opt/gobblin-dist/lib/javassist-3.18.2-GA.jar,/opt/gobblin-dist/lib/kafka_2.11-0.8.2.2.jar,/opt/gobblin-dist/lib/kafka-clients-0.8.2.2.jar,/opt/gobblin-dist/lib/metrics-core-2.2.0.jar,/opt/gobblin-dist/lib/metrics-core-3.1.0.jar,/opt/gobblin-dist/lib/metrics-graphite-3.1.0.jar,/opt/gobblin-dist/lib/scala-library-2.11.8.jar,/opt/gobblin-dist/lib/influxdb-java-2.1.jar,/opt/gobblin-dist/lib/okhttp-2.4.0.jar,/opt/gobblin-dist/lib/okio-1.4.0.jar,/opt/gobblin-dist/lib/retrofit-1.9.0.jar,/opt/gobblin-dist/lib/reflections-0.9.10.jar
>  \
>   --properties mapreduce.job.user.classpath.first=true \
>   -- -jobconfig gs://my-bucket/gobblin-kafka-gcs.job
>
> I have already tried copying all gobblins lib jars inside
> /usr/lib/hadoop/lib on all machines of the dataproc cluster, but it
> didn't work either.
>
> Any ideas?
>
> gobblin 0.10.0
> hadoop 2.7.3
> dataproc image 1.1
>
> ps: I've also posted this
> <http://stackoverflow.com/questions/44036037/nosuchmethoderror-when-trying-to-run-gobblin-on-dataproc>
> on Stackoverflow
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/linkedin/gobblin/issues/1877>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABTQkNrC3sm4sbIXCBOQqDptJjMsC4zBks5r63xqgaJpZM4Neity>
> .
>
 
 
*Github Url* : 
https://github.com/linkedin/gobblin/issues/1877#issuecomment-302256290 
----
*hgabreu* wrote on 2017-05-17T23:29:29Z : Isn't there a way to build gobblin 
with all dependencies bundled in the jar? probably also shaded to avoid such 
conflicts?

BTW, thanks a lot for your quick reply. Very helpful, I'm trying to sort out 
how to do this on dataproc. 
 
*Github Url* : 
https://github.com/linkedin/gobblin/issues/1877#issuecomment-302259199



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to