[
https://issues.apache.org/jira/browse/SAMZA-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125057#comment-15125057
]
Tao Feng commented on SAMZA-808:
--------------------------------
I wonder if there is an exception type in Kafka to tell the replica factor
could not be fulfilled. If there is such exception, we could distinguish this
exception in KafkaSystemAdmin.createTopicInKafka and set the "loop.done" . If
there is no such exception, we could add maxRetryCounter in
ExponentialSleepStrategy class to avoid busy retry.
> KafkaSystemAdmin.createChangelogStream retries indefinitely
> -----------------------------------------------------------
>
> Key: SAMZA-808
> URL: https://issues.apache.org/jira/browse/SAMZA-808
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.9.1
> Reporter: Tommy Becker
>
> Currently, KafkaSystemAdmin.createChangelogStream() treats all failures as
> transient. We recently hit an issue testing with a single node Kafka cluster
> where the changelog topic creation failed because we were getting the default
> replication factor of 2, which is obviously not achievable on a single node
> Kafka. It took quite a while to notice the job was not doing anything because
> createChangelogStream only logs a warning and continues to retry
> indefinitely. If it is not possible and/or desirable to distinguish
> exceptions that have a reasonable chance of success on retry vs not, I think
> Samza should limit the number or retries it will perform.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)