Hi, Rick,

Did you get the fix in SAMZA-723 in your test? And could you confirm that
the errors are from JobRunner log?

-Yi

On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi <r...@chartbeat.com> wrote:

> Hi,
>
> I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the
> latest). Everything works fine running locally (although I had to make some
> changes to the local grid’s kafka since the checkpointing seems to require
> replication_factor > 1) but when I deploy it against my production yarn
> cluster I get these errors.
>
> [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state
> changed (SyncConnected)
> [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate
> ZkClient event thread.
> [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session:
> 0x250233cdf57f2fa closed
> [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread shut
> down
> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO]
> Coordinator stream __samza_coordinator_metrics-reporter_1 already exists.
> [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config in
> coordinator stream.
> [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer
> [INFO] Starting coordinator stream producer.
> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO]
> Creating a new producer for system mykafka.
> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO]
> ProducerConfig values:
> [yarnmaster01] out:     value.serializer = class
> org.apache.kafka.common.serialization.ByteArraySerializer
> [yarnmaster01] out:     key.serializer = class
> org.apache.kafka.common.serialization.ByteArraySerializer
> [yarnmaster01] out:     block.on.buffer.full = true
> [yarnmaster01] out:     retry.backoff.ms = 100
> [yarnmaster01] out:     buffer.memory = 33554432
> [yarnmaster01] out:     batch.size = 16384
> [yarnmaster01] out:     metrics.sample.window.ms = 30000
> [yarnmaster01] out:     metadata.max.age.ms = 300000
> [yarnmaster01] out:     receive.buffer.bytes = 32768
> [yarnmaster01] out:     timeout.ms = 30000
> [yarnmaster01] out:     max.in.flight.requests.per.connection = 1
> [yarnmaster01] out:     bootstrap.servers = [
> devstream01.chartbeat.net:9092]
> [yarnmaster01] out:     metric.reporters = []
> [yarnmaster01] out:     client.id =
> samza_producer-metrics_reporter-1-1447342853273-4
> [yarnmaster01] out:     compression.type = none
> [yarnmaster01] out:     retries = 2147483647
> [yarnmaster01] out:     max.request.size = 1048576
> [yarnmaster01] out:     send.buffer.bytes = 131072
> [yarnmaster01] out:     acks = 1
> [yarnmaster01] out:     reconnect.backoff.ms = 10
> [yarnmaster01] out:     linger.ms = 0
> [yarnmaster01] out:     metrics.num.samples = 2
> [yarnmaster01] out:     metadata.fetch.timeout.ms = 60000
> [yarnmaster01] out:
> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The
> configuration batch.num.messages = null was supplied but isn't a known
> config.
> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The
> configuration producer.type = null was supplied but isn't a known config.
> [yarnmaster01] out: Exception in thread "main"
> org.apache.samza.SamzaException:
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
> after 60000 ms.
> [yarnmaster01] out:     at
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115)
> [yarnmaster01] out:     at
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132)
> [yarnmaster01] out:     at
> org.apache.samza.job.JobRunner.run(JobRunner.scala:85)
> [yarnmaster01] out:     at
> org.apache.samza.job.JobRunner$.main(JobRunner.scala:43)
> [yarnmaster01] out:     at
> org.apache.samza.job.JobRunner.main(JobRunner.scala)
> [yarnmaster01] out: Caused by:
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
> after 60000 ms.
> [yarnmaster01] out:
>
>
> Warning: run() received nonzero return code 1 while executing
> './bin/run-job.sh
> -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory
> --config-path=file://$PWD/conf/metrics_reporter.properties'!
>
>
> This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 but
> I’m not using a StreamAppender in log4j.
>
> Any ideas? My first thought is that I might have to delete the existing
> checkpoint topics but that would mean we can’t migrate completely until the
> 10.0 release unless we want to run snapshot code in production.
>
> Thanks!
>
> Rick
>
>
>

Reply via email to