agresch opened a new pull request #3370:
URL: https://github.com/apache/storm/pull/3370


   ## What is the purpose of the change
   
   When a local assignment changes for a slot, the blob references are not 
updated. This causes a mismatch between the slot PNA and the blob reference 
PNA. When the topology is killed, the blob tries to remove the slot PNA and 
cannot find the reference. The AsyncLocalizer still has the old slot PNA as a 
reference, so cannot release the blob.
   
   The local assignment could change due to a reordering of the same executors 
on a nimbus restart or something similar.
   
   The fix is basically removeReference() changes in LocallyCachedBlob.  We 
make sure that we are checking for an equivalent PNA object instead of equals.  
Additional logging was also added to track down this issue.  I think it's not 
excessive and considering how long this problem has existed, important to have.
   
   ## How was the change tested
   
   Ran storm-server unit tests.  This change has been tested internally by 
running integration tests and on our staging clusters for a couple weeks and 
validated that we no longer get AsyncLocalizer alerts for this condition. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to