GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/2737

    (1.x) STORM-3122 Avoid supervisor being crashed due to race condition 
between "async localizer" and "update blob" timer thread

    There's race condition between "async localizer" and "update blob" timer 
thread.
    
    When worker is shutting down, reference count for blob will be 0 and 
supervisor will remove actual blob file. There's also "update blob" timer 
thread which tries to keep blobs updated for downloaded topologies. While 
updating topology it should read some of blob files already downloaded assuming 
these files should be downloaded before, and the assumption is broken because 
of async localizer.
    
    @arunmahadevan suggested an approach to fix this: "updateBlobsForTopology" 
can just catch the FIleNotFoundException and skip updating the blobs in case it 
can't find the stormconf, and the approach looks simplest fix so I'll provide a 
patch based on suggestion.
    
    Btw, it doesn't apply to master branch, since in master branch all blobs 
are synced up separately (no need to read stormconf to enumerate topology 
related blobs), and update logic is already fault-tolerance (skip to next sync 
when it can't pull the blob).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-3122-1.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2737
    
----
commit 84d19c9ad66e2d24040c7e12dc96cef03ff7bcb3
Author: Jungtaek Lim <kabhwan@...>
Date:   2018-06-24T21:49:51Z

    STORM-3122 Avoid supervisor being crashed due to race condition between 
"async localizer" and "update blob" timer thread

----


---

Reply via email to