Hey Jeremiah, That error would suggest that the version of samza-yarn is older than 0.13.0. the run-am.sh script was renamed to run-jc.sh here: https://github.com/apache/samza/commit/9396ee5cc0a35e4e32844547eacebb24ae971c67#diff-2c8f8458c57bd38d1dea86d72d34b50eR89
Is it possible some samza jars are getting cached during deployment? -Jake On Tue, Aug 15, 2017 at 7:31 AM, Jeremiah Adams <jad...@helixeducation.com> wrote: > Thanks Jacob, the job is getting a bit further now but am seeing a > different issue now. > > The job fails and never moves into 'running'. The job looks to be > launching correctly: > > [10.201.11.64] out: 13:48:59.450 [IPC Client (2052489518) connection to > porter-samza-1.porter.int/127.0.0.1:8032 from centos] DEBUG > org.apache.hadoop.ipc.Client - IPC Client (2052489518) connection to > porter-samza-1.porter.int/127.0.0.1:8032 from centos got value #3 > [10.201.11.64] out: 13:48:59.451 [main] DEBUG > org.apache.hadoop.ipc.ProtobufRpcEngine > - Call: getApplicationReport took 2ms > [10.201.11.64] out: 13:48:59.452 [main] INFO > org.apache.samza.job.JobRunner - job started successfully - Running > [10.201.11.64] out: 13:48:59.452 [main] INFO > org.apache.samza.job.JobRunner - exiting > > > When I dig into the userlogs, the job never moves from the starting > container, stderr contains: > > [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ more > stderr > /bin/bash: /tmp/hadoop-centos/nm-local-dir/usercache/centos/appcache/ > application_1502753192195_0007/container_1502753192195_ > 0007_02_000001/__package/bin/run-am.sh: No such file or directory > > When I poke at the directory structure, the directory is empty at > appcache/ and filecache/ both: > > [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ ls > /tmp/hadoop-centos/nm-local-dir/usercache/centos/appcache/ > [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ > > > > > Jeremiah Adams > Software Engineer > www.helixeducation.com > Blog | Twitter | Facebook | LinkedIn > > ________________________________________ > From: Jacob Maes <jacob.m...@gmail.com> > Sent: Monday, August 14, 2017 3:12 PM > To: dev@samza.apache.org > Subject: Re: Issue with TopicExistsException in 0.13.0 > > Correction, the exception seems to have moved between kafka version > 0.10.0.1 and 0.10.1.1 > > Here's the patch that changed both the kafka version and the import > statement for TopicExistsException: > https://url.serverdata.net/?aZyQRg2CGut2qgyHrdHxA3r2wRZBhF > BnHgQFe8bv7-emnODgdhciwPkVKB_BE-ZnZmhwA18Q7rimVruRFx5g0vsvC9cG > t2jrAYfAucx0goYepLp8ZyfPAPxCv0Xh9CQVXTrqVMnByrbWTNcczkXashg2 > zljIWFPYiRKbG_5H2BvM~ > > So, you'll want to be using kafka 0.10.1.1. > > On Mon, Aug 14, 2017 at 2:00 PM, Jacob Maes <jacob.m...@gmail.com> wrote: > > > Hey Jeremiah, > > > > It looks like the TopicExistsException should be handled by the system > > admin and not rethrown: > > https://url.serverdata.net/?aZyQRg2CGut2qgyHrdHxA3r2wRZBhFBnHgQFe8bv7- > eli7bCaPPi9BUx7SPWnrBZJsWvG7fAvAkJZWsHy8YrwNKbg0eJOFg9N9UDBA > B2ODwZOGu2TuRvoZ9NyWbJmDt_g > > b84b20ffd2/samza-kafka/src/main/scala/org/apache/samza/ > > system/kafka/KafkaSystemAdmin.scala#L442 > > > > I have a theory what's happening here. I think the TopicExistsException > > was moved from the org.apache.kafka.common package in kafka 0.8.2 > > https://url.serverdata.net/?aGYQUT2PfoZ_Oed64B3A9noxqDhLnbYFqBHw3jimnO > 5vi3F8i7RsxdGks87OLmlvVSbRBbvJOT8rWW0hz_3vOmg~~ > > common/TopicExistsException.html > > > > to the org.apache.kafka.common.errors package in kafka 0.10 > > https://url.serverdata.net/?atT2ehXMhI-BK13fx1xs1ts_Kf81VsaPrd- > NHf6sUGn2ecNA4kUI3dYoA0607M-H1sV2xtByyu3eJSKvz3Cecre4DPAtt > j3Qs9n_BrkW6lDT8Xt-ACWGgEYMDI0JoIyzV > > TopicExistsException.html > > > > And Samza 0.13 expects the latter. > > > > Can you double check that your job is actually using kafka 0.10.1.1, > > perhaps by inspecting the jars? > > > > -Jake > > > > On Mon, Aug 14, 2017 at 11:55 AM, Jeremiah Adams < > > jad...@helixeducation.com> wrote: > > > >> I am having an issue with topic creation after updating dependencies. I > >> bumped samza dependencies from scala 2.10 v 0.10.1 to scala 2.11 0.13.0 > >> and org.apache.kafka dependency from kafka_2.10 0.8.1 to kafka_2.11 > >> 0.10.1.1. > >> I am seeing an error that the topic already exists and the job gets > stuck > >> in a loop with logs like below. The job will not move into 'accepted' > state > >> in yarn and never consumes the topics it should be consuming. The zk, > yarn > >> and kafka nodes are newly deployed. I'm at a loss, any ideas? > >> > >> > >> [10.201.9.105] out: 17:18:49.347 [main] DEBUG > >> org.apache.samza.system.kafka.KafkaSystemAdmin - Exception detail: > >> [10.201.9.105] out: kafka.common.TopicExistsException: Topic > >> "__samza_coordinator_inquiry-submission_1" already exists. > >> [10.201.9.105] out: at kafka.admin.AdminUtils$.create > >> OrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:420) > >> [10.201.9.105] out: at kafka.admin.AdminUtils$.create > >> Topic(AdminUtils.scala:404) > >> [10.201.9.105] out: at org.apache.samza.system.kafka. > >> KafkaSystemAdmin$$anonfun$createStream$1.apply(KafkaSystemAd > >> min.scala:425) > >> [10.201.9.105] out: at org.apache.samza.system.kafka. > >> KafkaSystemAdmin$$anonfun$createStream$1.apply(KafkaSystemAd > >> min.scala:422) > >> [10.201.9.105] out: at org.apache.samza.util.Exponent > >> ialSleepStrategy.run(ExponentialSleepStrategy.scala:82) > >> [10.201.9.105] out: at org.apache.samza.system.kafka. > >> KafkaSystemAdmin.createStream(KafkaSystemAdmin.scala:421) > >> [10.201.9.105] out: at org.apache.samza.system.kafka. > >> KafkaSystemAdmin.createCoordinatorStream(KafkaSystemAdmin.scala:336) > >> [10.201.9.105] out: at org.apache.samza.job.JobRunner > >> .run(JobRunner.scala:88) > >> [10.201.9.105] out: at org.apache.samza.job.JobRunner > >> $.doOperation(JobRunner.scala:52) > >> [10.201.9.105] out: at org.apache.samza.job.JobRunner > >> $.main(JobRunner.scala:47) > >> [10.201.9.105] out: at org.apache.samza.job.JobRunner > >> .main(JobRunner.scala) > >> [10.201.9.105] out: 17:18:49.347 [main-SendThread(ip-10-201-9-2 > >> 43.us-west-2.compute.internal:2181)] DEBUG org.apache.zookeeper. > ClientCnxn > >> - An exception was thrown while closing send thread for session > >> 0x25de16b1f500013 : Unable to read additional data from server sessionid > >> 0x25de16b1f500013, likely server has closed socket > >> [10.201.9.105] out: 17:18:49.349 [main-EventThread] INFO > >> org.apache.zookeeper.ClientCnxn - EventThread shut down? > >> > >> > >> > >> Jeremiah Adams > >> Software Engineer > >> https://url.serverdata.net/?ahfhEufaAWbezBrUFPG98ZJcterGfI > erU3ZwsA3Gv_C0~<https://url.serverdata.net/?a49H2rNGIIBtQOw6md8OcHp- > qKE3Xn2gNiZ3dlqAeSDA~> > >> Blog<https://url.serverdata.net/?a49H2rNGIIBtQOw6md8OcHgFEZu- > KYuiu8doY66NWwmmyWxz7kC-27Yfnbdgd2wyh5gjXUa6LMT_NRXsj1g1VVg~~> | Twitter< > >> https://url.serverdata.net/?a0Q7ct5_6cOdbJ86kpWB0zx6RbtgugTVC7lU_ > W7za50jLdZQGpLgVlR1V06zckSaM5oOKb6QBo46Qp9xt0Tt7Aw~~> | Facebook< > https://url.serverdata.net/?aAmyAO_nS_C1aDgBLeKyGTt253c4xO8jY2FEj4eUKEJA~. > >> com/HelixEducation> | LinkedIn<https://url.serverdata.net/?aanlcNI- > cN74Gdz-TD332xAl6lHu7TRNICWoHUFjYf-KlBjrCGHoYR65b3rl- > OyW10nWFv6hwYvUSoVHL4b3vGA~~> > >> > > > > >