[
https://issues.apache.org/jira/browse/SAMZA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489906#comment-17489906
]
Bharath Kumarasubramanian commented on SAMZA-2721:
--------------------------------------------------
Here is the sample case where the container hit the code path where it swallows
the exception and logs
{code:java}
2022-02-03 00:15:53.624 [crm-sync-entity-resolution-pipeline-i001-auditor]
KafkaProducer [INFO] [Producer
clientId=crm-sync-entity-resolution-pipeline-i001-auditor] Closing the Kafka
producer with timeoutMillis = 9223370393007422183 ms.
2022-02-03 00:15:53.625 [main] LiKafkaConsumerImpl [INFO] Shutdown complete in
11 millis
2022-02-03 00:15:53.626 [main] ContainerLaunchUtil [ERROR] Container stopped
with Exception.
2022-02-03 00:15:53.626 [main] CoordinatorStreamStore [INFO] Stopping the
coordinator stream system consumer.
2022-02-03 00:15:53.626 [main] LiKafkaConsumerImpl [INFO] Shutting down ...
2022-02-03 00:15:53.627 [main] AbstractAuditor [INFO] Closing auditor with
timeout 9223372036854775807 MILLI {code}
https://github.com/apache/samza/blob/2e7c2fe6c095d05b1b75fc0c2768ad0e9a81a085/samza-core/src/main/java/org/apache/samza/runtime/ContainerLaunchUtil.java#L185
> Container should exit with non-zero status code in case of errors during
> launch
> -------------------------------------------------------------------------------
>
> Key: SAMZA-2721
> URL: https://issues.apache.org/jira/browse/SAMZA-2721
> Project: Samza
> Issue Type: Bug
> Reporter: Bharath Kumarasubramanian
> Priority: Major
>
> {*}Problem{*}:
> ContainerLaunchUtil during its launch sequence swallows exception and
> proceeds to shutdown with 0 status code. This causes AM to not restart the
> container.
> {*}Description{*}:
> With the run method, as part of launch sequence we have various
> initialization steps before kicking off the container. In case of exceptions
> during this step, the run method catches all erros but only logs them and
> proceeds to shutdown as usual.
> Due to normal exit, AM treats the container completed successfully and hence
> doesn't restart causing the failed container to remain failed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)