Hi, Rick, Did you get the fix in SAMZA-723 in your test? And could you confirm that the errors are from JobRunner log?
-Yi On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi <r...@chartbeat.com> wrote: > Hi, > > I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the > latest). Everything works fine running locally (although I had to make some > changes to the local grid’s kafka since the checkpointing seems to require > replication_factor > 1) but when I deploy it against my production yarn > cluster I get these errors. > > [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state > changed (SyncConnected) > [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate > ZkClient event thread. > [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: > 0x250233cdf57f2fa closed > [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread shut > down > [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] > Coordinator stream __samza_coordinator_metrics-reporter_1 already exists. > [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config in > coordinator stream. > [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer > [INFO] Starting coordinator stream producer. > [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] > Creating a new producer for system mykafka. > [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] > ProducerConfig values: > [yarnmaster01] out: value.serializer = class > org.apache.kafka.common.serialization.ByteArraySerializer > [yarnmaster01] out: key.serializer = class > org.apache.kafka.common.serialization.ByteArraySerializer > [yarnmaster01] out: block.on.buffer.full = true > [yarnmaster01] out: retry.backoff.ms = 100 > [yarnmaster01] out: buffer.memory = 33554432 > [yarnmaster01] out: batch.size = 16384 > [yarnmaster01] out: metrics.sample.window.ms = 30000 > [yarnmaster01] out: metadata.max.age.ms = 300000 > [yarnmaster01] out: receive.buffer.bytes = 32768 > [yarnmaster01] out: timeout.ms = 30000 > [yarnmaster01] out: max.in.flight.requests.per.connection = 1 > [yarnmaster01] out: bootstrap.servers = [ > devstream01.chartbeat.net:9092] > [yarnmaster01] out: metric.reporters = [] > [yarnmaster01] out: client.id = > samza_producer-metrics_reporter-1-1447342853273-4 > [yarnmaster01] out: compression.type = none > [yarnmaster01] out: retries = 2147483647 > [yarnmaster01] out: max.request.size = 1048576 > [yarnmaster01] out: send.buffer.bytes = 131072 > [yarnmaster01] out: acks = 1 > [yarnmaster01] out: reconnect.backoff.ms = 10 > [yarnmaster01] out: linger.ms = 0 > [yarnmaster01] out: metrics.num.samples = 2 > [yarnmaster01] out: metadata.fetch.timeout.ms = 60000 > [yarnmaster01] out: > [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The > configuration batch.num.messages = null was supplied but isn't a known > config. > [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The > configuration producer.type = null was supplied but isn't a known config. > [yarnmaster01] out: Exception in thread "main" > org.apache.samza.SamzaException: > org.apache.kafka.common.errors.TimeoutException: Failed to update metadata > after 60000 ms. > [yarnmaster01] out: at > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115) > [yarnmaster01] out: at > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132) > [yarnmaster01] out: at > org.apache.samza.job.JobRunner.run(JobRunner.scala:85) > [yarnmaster01] out: at > org.apache.samza.job.JobRunner$.main(JobRunner.scala:43) > [yarnmaster01] out: at > org.apache.samza.job.JobRunner.main(JobRunner.scala) > [yarnmaster01] out: Caused by: > org.apache.kafka.common.errors.TimeoutException: Failed to update metadata > after 60000 ms. > [yarnmaster01] out: > > > Warning: run() received nonzero return code 1 while executing > './bin/run-job.sh > -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory > --config-path=file://$PWD/conf/metrics_reporter.properties'! > > > This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 but > I’m not using a StreamAppender in log4j. > > Any ideas? My first thought is that I might have to delete the existing > checkpoint topics but that would mean we can’t migrate completely until the > 10.0 release unless we want to run snapshot code in production. > > Thanks! > > Rick > > >