[ 
https://issues.apache.org/jira/browse/STORM-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342703#comment-15342703
 ] 

Jeyendran Balakrishnan commented on STORM-1879:
-----------------------------------------------

I'm encountering this problem too, where a worker task fails due to some 
reason, and then the supervisor gets stuck in this mode where it repeatedly, 
but unsuccessfully, tries to update the blob.
Disk activity goes to 100% (doing just write accesses), and the whole topology 
basically stalls.
I have to stop the topology and restart it, and hope the problem does not recur 
too quickly.
My environment is on a single Windows 8.1 laptop on a LocalCluster, with one 
spout, a total of 7 bolt instances (4 different bolts), with all ACKing turned 
off (ACK executors = 0).

> Supervisor may not shut down workers cleanly
> --------------------------------------------
>
>                 Key: STORM-1879
>                 URL: https://issues.apache.org/jira/browse/STORM-1879
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.1
>            Reporter: Stig Rohde Døssing
>         Attachments: fix_missing_worker_pid.patch, nimbus-supervisor.zip, 
> supervisor.log
>
>
> We've run into a strange issue with a zombie worker process. It looks like 
> the worker pid file somehow got deleted without the worker process shutting 
> down. This causes the supervisor to try repeatedly to kill the worker 
> unsuccessfully, and means multiple workers may be assigned to the same port. 
> The worker root folder sticks around because the worker is still heartbeating 
> to it.
> It may or may not be related that we've seen Nimbus occasionally enter an 
> infinite loop of printing logs similar to the below.
> {code}
> 2016-05-19 14:55:14.196 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.210 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.218 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> 2016-05-19 14:55:14.256 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.273 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormcode.ser
> 2016-05-19 14:55:14.316 o.a.s.b.BlobStoreUtils [ERROR] Could not update the 
> blob with keyZendeskTicketTopology-5-1463647641-stormconf.ser
> {code}
> Which continues until Nimbus is rebooted. We also see repeating blocks 
> similar to the logs below.
> {code}
> 2016-06-02 07:45:03.656 o.a.s.d.nimbus [INFO] Cleaning up 
> ZendeskTicketTopology-127-1464780171
> 2016-06-02 07:45:04.132 o.a.s.d.nimbus [INFO] 
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormjar.jar)
> 2016-06-02 07:45:04.144 o.a.s.d.nimbus [INFO] 
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormconf.ser)
> 2016-06-02 07:45:04.155 o.a.s.d.nimbus [INFO] 
> ExceptionKeyNotFoundException(msg:ZendeskTicketTopology-127-1464780171-stormcode.ser)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to