Jungtaek Lim created STORM-3122:
-----------------------------------

             Summary: FNFE due to race condition between "async localizer" and 
"update blob" timer thread
                 Key: STORM-3122
                 URL: https://issues.apache.org/jira/browse/STORM-3122
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.x
            Reporter: Jungtaek Lim
            Assignee: Jungtaek Lim


There's race condition between "async localizer" and "update blob" timer thread.

When worker is shutting down, reference count for blob will be 0 and supervisor 
will remove actual blob file. There's also "update blob" timer thread which 
tries to keep blobs updated for downloaded topologies. While updating topology 
it should read some of blob files already downloaded assuming these files 
should be downloaded before, and the assumption is broken because of async 
localizer.

[~arunmahadevan] suggested an approach to fix this: "updateBlobsForTopology" 
can just catch the FIleNotFoundException and skip updating the blobs in case it 
can't find the stormconf, and the approach looks simplest fix so I'll provide a 
patch based on suggestion.

Btw, it doesn't apply to master branch, since in master branch all blobs are 
synced up separately (no need to read stormconf to enumerate topology related 
blobs), and update logic is already fault-tolerance (skip to next sync when it 
can't pull the blob).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to