[ 
https://issues.apache.org/jira/browse/STORM-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354803#comment-15354803
 ] 

Jungtaek Lim commented on STORM-1879:
-------------------------------------

Sorry for late participating. I have been struggling with other works.

I'm suspecting that many issues from supervisor are from race condition 
(sync-supervisor and sync-processes).

One of supervisor test is intermittently failing 
([STORM-1933|https://issues.apache.org/jira/browse/STORM-1933]), and after 
digging I found that supervisor has race condition which can create various 
issues.
(What [~nico.meyer] pointed out seems to be same to what STORM-1933 shows.)

I submitted a [patch|https://github.com/apache/storm/pull/1528] to 
[STORM-1934|https://issues.apache.org/jira/browse/STORM-1934] so I'd be really 
happy if you applies my patch and see it works. 

> Supervisor may not shut down workers cleanly
> --------------------------------------------
>
>                 Key: STORM-1879
>                 URL: https://issues.apache.org/jira/browse/STORM-1879
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.1
>            Reporter: Stig Rohde Døssing
>         Attachments: fix_missing_worker_pid.patch, nimbus-supervisor.zip, 
> supervisor.log
>
>
> We've run into a strange issue with a zombie worker process. It looks like 
> the worker pid file somehow got deleted without the worker process shutting 
> down. This causes the supervisor to try repeatedly to kill the worker 
> unsuccessfully, and means multiple workers may be assigned to the same port. 
> The worker root folder sticks around because the worker is still heartbeating 
> to it.
> It may or may not be related that we've seen Nimbus occasionally enter an 
> infinite loop of printing logs similar to the below.
> {code}
> 2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> {code}
> Which continues until Nimbus is rebooted. We also see repeating blocks 
> similar to the logs below.
> {code}
> 2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up 
> ZendeskTicketTopology-127-1464780171
> 2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO] 
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar)
> 2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO] 
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser)
> 2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO] 
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to