GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/storm/pull/2737
(1.x) STORM-3122 Avoid supervisor being crashed due to race condition
between "async localizer" and "update blob" timer thread
There's race condition between "async localizer" and "update blob" timer
thread.
When worker is shutting down, reference count for blob will be 0 and
supervisor will remove actual blob file. There's also "update blob" timer
thread which tries to keep blobs updated for downloaded topologies. While
updating topology it should read some of blob files already downloaded assuming
these files should be downloaded before, and the assumption is broken because
of async localizer.
@arunmahadevan suggested an approach to fix this: "updateBlobsForTopology"
can just catch the FIleNotFoundException and skip updating the blobs in case it
can't find the stormconf, and the approach looks simplest fix so I'll provide a
patch based on suggestion.
Btw, it doesn't apply to master branch, since in master branch all blobs
are synced up separately (no need to read stormconf to enumerate topology
related blobs), and update logic is already fault-tolerance (skip to next sync
when it can't pull the blob).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HeartSaVioR/storm STORM-3122-1.x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2737.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2737
----
commit 84d19c9ad66e2d24040c7e12dc96cef03ff7bcb3
Author: Jungtaek Lim <kabhwan@...>
Date: 2018-06-24T21:49:51Z
STORM-3122 Avoid supervisor being crashed due to race condition between
"async localizer" and "update blob" timer thread
----
---