Jungtaek Lim created STORM-3122:
-----------------------------------
Summary: FNFE due to race condition between "async localizer" and
"update blob" timer thread
Key: STORM-3122
URL: https://issues.apache.org/jira/browse/STORM-3122
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Affects Versions: 1.x
Reporter: Jungtaek Lim
Assignee: Jungtaek Lim
There's race condition between "async localizer" and "update blob" timer thread.
When worker is shutting down, reference count for blob will be 0 and supervisor
will remove actual blob file. There's also "update blob" timer thread which
tries to keep blobs updated for downloaded topologies. While updating topology
it should read some of blob files already downloaded assuming these files
should be downloaded before, and the assumption is broken because of async
localizer.
[~arunmahadevan] suggested an approach to fix this: "updateBlobsForTopology"
can just catch the FIleNotFoundException and skip updating the blobs in case it
can't find the stormconf, and the approach looks simplest fix so I'll provide a
patch based on suggestion.
Btw, it doesn't apply to master branch, since in master branch all blobs are
synced up separately (no need to read stormconf to enumerate topology related
blobs), and update logic is already fault-tolerance (skip to next sync when it
can't pull the blob).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)