Hi, Rick, Yes, please open a JIRA w/ your configuration, deployment set up and sequence, and logs from JobRunner.
Thanks a lot! -Yi On Thu, Nov 12, 2015 at 10:10 AM, Rick Mangi <r...@chartbeat.com> wrote: > Hi Yi, > > I pulled from master and built this morning. > > Yes, that’s the output from JobRunner. I also tried setting a job.id to > see if this was an issue migrating from an old task checkpoint topic but I > got the same result. > > Would you like me to open a jira ticket? > > Thanks, > > Rick > > > > > On Nov 12, 2015, at 12:59 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > > Hi, Rick, > > > > Did you get the fix in SAMZA-723 in your test? And could you confirm that > > the errors are from JobRunner log? > > > > -Yi > > > > On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi <r...@chartbeat.com> wrote: > > > >> Hi, > >> > >> I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against > the > >> latest). Everything works fine running locally (although I had to make > some > >> changes to the local grid’s kafka since the checkpointing seems to > require > >> replication_factor > 1) but when I deploy it against my production yarn > >> cluster I get these errors. > >> > >> [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state > >> changed (SyncConnected) > >> [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate > >> ZkClient event thread. > >> [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: > >> 0x250233cdf57f2fa closed > >> [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread > shut > >> down > >> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] > >> Coordinator stream __samza_coordinator_metrics-reporter_1 already > exists. > >> [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config > in > >> coordinator stream. > >> [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer > >> [INFO] Starting coordinator stream producer. > >> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] > >> Creating a new producer for system mykafka. > >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] > >> ProducerConfig values: > >> [yarnmaster01] out: value.serializer = class > >> org.apache.kafka.common.serialization.ByteArraySerializer > >> [yarnmaster01] out: key.serializer = class > >> org.apache.kafka.common.serialization.ByteArraySerializer > >> [yarnmaster01] out: block.on.buffer.full = true > >> [yarnmaster01] out: retry.backoff.ms = 100 > >> [yarnmaster01] out: buffer.memory = 33554432 > >> [yarnmaster01] out: batch.size = 16384 > >> [yarnmaster01] out: metrics.sample.window.ms = 30000 > >> [yarnmaster01] out: metadata.max.age.ms = 300000 > >> [yarnmaster01] out: receive.buffer.bytes = 32768 > >> [yarnmaster01] out: timeout.ms = 30000 > >> [yarnmaster01] out: max.in.flight.requests.per.connection = 1 > >> [yarnmaster01] out: bootstrap.servers = [ > >> devstream01.chartbeat.net:9092] > >> [yarnmaster01] out: metric.reporters = [] > >> [yarnmaster01] out: client.id = > >> samza_producer-metrics_reporter-1-1447342853273-4 > >> [yarnmaster01] out: compression.type = none > >> [yarnmaster01] out: retries = 2147483647 > >> [yarnmaster01] out: max.request.size = 1048576 > >> [yarnmaster01] out: send.buffer.bytes = 131072 > >> [yarnmaster01] out: acks = 1 > >> [yarnmaster01] out: reconnect.backoff.ms = 10 > >> [yarnmaster01] out: linger.ms = 0 > >> [yarnmaster01] out: metrics.num.samples = 2 > >> [yarnmaster01] out: metadata.fetch.timeout.ms = 60000 > >> [yarnmaster01] out: > >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The > >> configuration batch.num.messages = null was supplied but isn't a known > >> config. > >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The > >> configuration producer.type = null was supplied but isn't a known > config. > >> [yarnmaster01] out: Exception in thread "main" > >> org.apache.samza.SamzaException: > >> org.apache.kafka.common.errors.TimeoutException: Failed to update > metadata > >> after 60000 ms. > >> [yarnmaster01] out: at > >> > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115) > >> [yarnmaster01] out: at > >> > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132) > >> [yarnmaster01] out: at > >> org.apache.samza.job.JobRunner.run(JobRunner.scala:85) > >> [yarnmaster01] out: at > >> org.apache.samza.job.JobRunner$.main(JobRunner.scala:43) > >> [yarnmaster01] out: at > >> org.apache.samza.job.JobRunner.main(JobRunner.scala) > >> [yarnmaster01] out: Caused by: > >> org.apache.kafka.common.errors.TimeoutException: Failed to update > metadata > >> after 60000 ms. > >> [yarnmaster01] out: > >> > >> > >> Warning: run() received nonzero return code 1 while executing > >> './bin/run-job.sh > >> > -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory > >> --config-path=file://$PWD/conf/metrics_reporter.properties'! > >> > >> > >> This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 > but > >> I’m not using a StreamAppender in log4j. > >> > >> Any ideas? My first thought is that I might have to delete the existing > >> checkpoint topics but that would mean we can’t migrate completely until > the > >> 10.0 release unless we want to run snapshot code in production. > >> > >> Thanks! > >> > >> Rick > >> > >> > >> > >