[
https://issues.apache.org/jira/browse/STORM-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15331471#comment-15331471
]
Nico Meyer edited comment on STORM-1879 at 6/15/16 10:05 AM:
-------------------------------------------------------------
The trouble starts here:
bq. 2016-06-14 18:31:04.465 o.a.s.d.supervisor [INFO] Shutting down
ee56fb9d-2657-4d3f-b52a-1ae4abae85f7:
leading to:
{quote}2016-06-14 18:31:04.471 o.a.s.util [DEBUG] Rmr path
/var/lib/storm/storm-local/workers//heartbeats
2016-06-14 18:31:04.471 o.a.s.util [DEBUG] Rmr path
/var/lib/storm/storm-local/workers//pids
2016-06-14 18:31:04.471 o.a.s.util [DEBUG] Rmr path
/var/lib/storm/storm-local/workers/{quote}
I am still trying to figure out why this happens, but I think an {{assert}} is
in order somewhere
was (Author: nico.meyer):
The trouble starts here:
bq. 2016-06-14 18:31:04.465 o.a.s.d.supervisor [INFO] Shutting down
ee56fb9d-2657-4d3f-b52a-1ae4abae85f7:
leading to:
{quote}2016-06-14 18:31:04.471 o.a.s.util [DEBUG] Rmr path
/var/lib/storm/storm-local/workers//heartbeats
2016-06-14 18:31:04.471 o.a.s.util [DEBUG] Rmr path
/var/lib/storm/storm-local/workers//pids
2016-06-14 18:31:04.471 o.a.s.util [DEBUG] Rmr path
/var/lib/storm/storm-local/workers/{quote}
> Supervisor may not shut down workers cleanly
> --------------------------------------------
>
> Key: STORM-1879
> URL: https://issues.apache.org/jira/browse/STORM-1879
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.1
> Reporter: Stig Rohde Døssing
> Attachments: nimbus-supervisor.zip, supervisor.log
>
>
> We've run into a strange issue with a zombie worker process. It looks like
> the worker pid file somehow got deleted without the worker process shutting
> down. This causes the supervisor to try repeatedly to kill the worker
> unsuccessfully, and means multiple workers may be assigned to the same port.
> The worker root folder sticks around because the worker is still heartbeating
> to it.
> It may or may not be related that we've seen Nimbus occasionally enter an
> infinite loop of printing logs similar to the below.
> {code}
> 2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> {code}
> Which continues until Nimbus is rebooted. We also see repeating blocks
> similar to the logs below.
> {code}
> 2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up
> ZendeskTicketTopology-127-1464780171
> 2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO]
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar)
> 2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO]
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser)
> 2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO]
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)