Hey Jeremiah,

That error would suggest that the version of samza-yarn is older than
0.13.0. the run-am.sh script was renamed to run-jc.sh here:
https://github.com/apache/samza/commit/9396ee5cc0a35e4e32844547eacebb24ae971c67#diff-2c8f8458c57bd38d1dea86d72d34b50eR89

Is it possible some samza jars are getting cached during deployment?

-Jake

On Tue, Aug 15, 2017 at 7:31 AM, Jeremiah Adams <jad...@helixeducation.com>
wrote:

> Thanks Jacob, the job is getting a bit further now but am seeing a
> different issue now.
>
> The job fails and never moves into 'running'. The job looks to be
> launching correctly:
>
> [10.201.11.64] out: 13:48:59.450 [IPC Client (2052489518) connection to
> porter-samza-1.porter.int/127.0.0.1:8032 from centos] DEBUG
> org.apache.hadoop.ipc.Client - IPC Client (2052489518) connection to
> porter-samza-1.porter.int/127.0.0.1:8032 from centos got value #3
> [10.201.11.64] out: 13:48:59.451 [main] DEBUG 
> org.apache.hadoop.ipc.ProtobufRpcEngine
> - Call: getApplicationReport took 2ms
> [10.201.11.64] out: 13:48:59.452 [main] INFO
> org.apache.samza.job.JobRunner - job started successfully - Running
> [10.201.11.64] out: 13:48:59.452 [main] INFO
> org.apache.samza.job.JobRunner - exiting
>
>
> When I dig into the userlogs, the job never moves from the starting
> container, stderr contains:
>
> [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ more
> stderr
> /bin/bash: /tmp/hadoop-centos/nm-local-dir/usercache/centos/appcache/
> application_1502753192195_0007/container_1502753192195_
> 0007_02_000001/__package/bin/run-am.sh: No such file or directory
>
> When I poke at the directory structure, the directory is empty at
> appcache/ and filecache/ both:
>
> [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ ls
> /tmp/hadoop-centos/nm-local-dir/usercache/centos/appcache/
> [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$
>
>
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com
> Blog | Twitter | Facebook | LinkedIn
>
> ________________________________________
> From: Jacob Maes <jacob.m...@gmail.com>
> Sent: Monday, August 14, 2017 3:12 PM
> To: dev@samza.apache.org
> Subject: Re: Issue with TopicExistsException in 0.13.0
>
> Correction, the exception seems to have moved between kafka version
> 0.10.0.1 and 0.10.1.1
>
> Here's the patch that changed both the kafka version and the import
> statement for TopicExistsException:
> https://url.serverdata.net/?aZyQRg2CGut2qgyHrdHxA3r2wRZBhF
> BnHgQFe8bv7-emnODgdhciwPkVKB_BE-ZnZmhwA18Q7rimVruRFx5g0vsvC9cG
> t2jrAYfAucx0goYepLp8ZyfPAPxCv0Xh9CQVXTrqVMnByrbWTNcczkXashg2
> zljIWFPYiRKbG_5H2BvM~
>
> So, you'll want to be using kafka 0.10.1.1.
>
> On Mon, Aug 14, 2017 at 2:00 PM, Jacob Maes <jacob.m...@gmail.com> wrote:
>
> > Hey Jeremiah,
> >
> > It looks like the TopicExistsException should be handled by the system
> > admin and not rethrown:
> > https://url.serverdata.net/?aZyQRg2CGut2qgyHrdHxA3r2wRZBhFBnHgQFe8bv7-
> eli7bCaPPi9BUx7SPWnrBZJsWvG7fAvAkJZWsHy8YrwNKbg0eJOFg9N9UDBA
> B2ODwZOGu2TuRvoZ9NyWbJmDt_g
> > b84b20ffd2/samza-kafka/src/main/scala/org/apache/samza/
> > system/kafka/KafkaSystemAdmin.scala#L442
> >
> > I have a theory what's happening here. I think the TopicExistsException
> > was moved from the org.apache.kafka.common package in kafka 0.8.2
> > https://url.serverdata.net/?aGYQUT2PfoZ_Oed64B3A9noxqDhLnbYFqBHw3jimnO
> 5vi3F8i7RsxdGks87OLmlvVSbRBbvJOT8rWW0hz_3vOmg~~
> > common/TopicExistsException.html
> >
> > to the org.apache.kafka.common.errors package in kafka 0.10
> > https://url.serverdata.net/?atT2ehXMhI-BK13fx1xs1ts_Kf81VsaPrd-
> NHf6sUGn2ecNA4kUI3dYoA0607M-H1sV2xtByyu3eJSKvz3Cecre4DPAtt
> j3Qs9n_BrkW6lDT8Xt-ACWGgEYMDI0JoIyzV
> > TopicExistsException.html
> >
> > And Samza 0.13 expects the latter.
> >
> > Can you double check that your job is actually using kafka 0.10.1.1,
> > perhaps by inspecting the jars?
> >
> > -Jake
> >
> > On Mon, Aug 14, 2017 at 11:55 AM, Jeremiah Adams <
> > jad...@helixeducation.com> wrote:
> >
> >> I am having an issue with topic creation after updating dependencies. I
> >> bumped samza dependencies from scala 2.10 v 0.10.1 to  scala 2.11 0.13.0
> >> and org.apache.kafka dependency from kafka_2.10 0.8.1 to kafka_2.11
> >> 0.10.1.1.
> >> I am seeing an error that the topic already exists and the job gets
> stuck
> >> in a loop with logs like below. The job will not move into 'accepted'
> state
> >> in yarn and never consumes the topics it should be consuming. The zk,
> yarn
> >> and kafka nodes are newly deployed. I'm at a loss, any ideas?
> >>
> >>
> >> [10.201.9.105] out: 17:18:49.347 [main] DEBUG
> >> org.apache.samza.system.kafka.KafkaSystemAdmin - Exception detail:
> >> [10.201.9.105] out: kafka.common.TopicExistsException: Topic
> >> "__samza_coordinator_inquiry-submission_1" already exists.
> >> [10.201.9.105] out: at kafka.admin.AdminUtils$.create
> >> OrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:420)
> >> [10.201.9.105] out: at kafka.admin.AdminUtils$.create
> >> Topic(AdminUtils.scala:404)
> >> [10.201.9.105] out: at org.apache.samza.system.kafka.
> >> KafkaSystemAdmin$$anonfun$createStream$1.apply(KafkaSystemAd
> >> min.scala:425)
> >> [10.201.9.105] out: at org.apache.samza.system.kafka.
> >> KafkaSystemAdmin$$anonfun$createStream$1.apply(KafkaSystemAd
> >> min.scala:422)
> >> [10.201.9.105] out: at org.apache.samza.util.Exponent
> >> ialSleepStrategy.run(ExponentialSleepStrategy.scala:82)
> >> [10.201.9.105] out: at org.apache.samza.system.kafka.
> >> KafkaSystemAdmin.createStream(KafkaSystemAdmin.scala:421)
> >> [10.201.9.105] out: at org.apache.samza.system.kafka.
> >> KafkaSystemAdmin.createCoordinatorStream(KafkaSystemAdmin.scala:336)
> >> [10.201.9.105] out: at org.apache.samza.job.JobRunner
> >> .run(JobRunner.scala:88)
> >> [10.201.9.105] out: at org.apache.samza.job.JobRunner
> >> $.doOperation(JobRunner.scala:52)
> >> [10.201.9.105] out: at org.apache.samza.job.JobRunner
> >> $.main(JobRunner.scala:47)
> >> [10.201.9.105] out: at org.apache.samza.job.JobRunner
> >> .main(JobRunner.scala)
> >> [10.201.9.105] out: 17:18:49.347 [main-SendThread(ip-10-201-9-2
> >> 43.us-west-2.compute.internal:2181)] DEBUG org.apache.zookeeper.
> ClientCnxn
> >> - An exception was thrown while closing send thread for session
> >> 0x25de16b1f500013 : Unable to read additional data from server sessionid
> >> 0x25de16b1f500013, likely server has closed socket
> >> [10.201.9.105] out: 17:18:49.349 [main-EventThread] INFO
> >> org.apache.zookeeper.ClientCnxn - EventThread shut down?
> >>
> >>
> >>
> >> Jeremiah Adams
> >> Software Engineer
> >> https://url.serverdata.net/?ahfhEufaAWbezBrUFPG98ZJcterGfI
> erU3ZwsA3Gv_C0~<https://url.serverdata.net/?a49H2rNGIIBtQOw6md8OcHp-
> qKE3Xn2gNiZ3dlqAeSDA~>
> >> Blog<https://url.serverdata.net/?a49H2rNGIIBtQOw6md8OcHgFEZu-
> KYuiu8doY66NWwmmyWxz7kC-27Yfnbdgd2wyh5gjXUa6LMT_NRXsj1g1VVg~~> | Twitter<
> >> https://url.serverdata.net/?a0Q7ct5_6cOdbJ86kpWB0zx6RbtgugTVC7lU_
> W7za50jLdZQGpLgVlR1V06zckSaM5oOKb6QBo46Qp9xt0Tt7Aw~~> | Facebook<
> https://url.serverdata.net/?aAmyAO_nS_C1aDgBLeKyGTt253c4xO8jY2FEj4eUKEJA~.
> >> com/HelixEducation> | LinkedIn<https://url.serverdata.net/?aanlcNI-
> cN74Gdz-TD332xAl6lHu7TRNICWoHUFjYf-KlBjrCGHoYR65b3rl-
> OyW10nWFv6hwYvUSoVHL4b3vGA~~>
> >>
> >
> >
>

Reply via email to