[
https://issues.apache.org/jira/browse/SAMZA-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinyu Liu updated SAMZA-1572:
-----------------------------
Fix Version/s: (was: 0.15.0)
0.14.1
> Add fixed retries on failure in KafkaCheckpointManager
> ------------------------------------------------------
>
> Key: SAMZA-1572
> URL: https://issues.apache.org/jira/browse/SAMZA-1572
> Project: Samza
> Issue Type: Bug
> Reporter: Shanthoosh Venkataraman
> Assignee: Shanthoosh Venkataraman
> Priority: Major
> Fix For: 0.14.1
>
>
> KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop
> when an irrecoverable failure happens, this indefinitely blocks the commit
> phase (there by preventing processing). This exception is revealed only
> during the shutdown of the job making shutdown block indefinitely since the
> markers for shutdown are ignored by runloop which is blocked on commit phase.
> {code:java}
> 2018/01/22 19:18:10.503 WARN [KafkaCheckpointManager] [] Failed to write
> checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Flush failed. One or more
> batches of messages were not sent. Retrying. 2018/01/22 19:18:10.604 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:10.804 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:11.204 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:12.005 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:13.605 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:16.805 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:23.205 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:33.206 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:43.206 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:18:53.206 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:19:03.207 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:19:13.207 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exceptio 2018/01/22 19:19:23.207 WARN
> [KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
> org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
> org.apache.samza.system.SystemProducerException: Producer was unable to
> recover from previous exception.. Retrying.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)