[
https://issues.apache.org/jira/browse/STORM-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344431#comment-15344431
]
Robert Joseph Evans commented on STORM-1879:
--------------------------------------------
I have not seen either of these issue on our clusters.
I don't have a lot of time right now to try and reproduce it. I don't know how
much, if any critical information is going to be in the memory of nimbus/the
supervisor for your topology, but I don't suspect that it will be very much.
If any of you are OK with taking a heap dump of both nimbus and the supervisor
that is causing the issues when this happens it would probably be really
helpful.
The errors above happen occasionally because the cleanup of the files does not
always coincidence with shooting the worker so the worker could be
relauched/come up after the supervisor removed the config file it needs.
> Supervisor may not shut down workers cleanly
> --------------------------------------------
>
> Key: STORM-1879
> URL: https://issues.apache.org/jira/browse/STORM-1879
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.1
> Reporter: Stig Rohde Døssing
> Attachments: fix_missing_worker_pid.patch, nimbus-supervisor.zip,
> supervisor.log
>
>
> We've run into a strange issue with a zombie worker process. It looks like
> the worker pid file somehow got deleted without the worker process shutting
> down. This causes the supervisor to try repeatedly to kill the worker
> unsuccessfully, and means multiple workers may be assigned to the same port.
> The worker root folder sticks around because the worker is still heartbeating
> to it.
> It may or may not be related that we've seen Nimbus occasionally enter an
> infinite loop of printing logs similar to the below.
> {code}
> 2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> {code}
> Which continues until Nimbus is rebooted. We also see repeating blocks
> similar to the logs below.
> {code}
> 2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up
> ZendeskTicketTopology-127-1464780171
> 2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO]
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar)
> 2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO]
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser)
> 2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO]
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)