srdo opened a new pull request #2998:  STORM-3379: Fix intermittent NPE during 
worker boot in local mode 
URL: https://github.com/apache/storm/pull/2998
 
 
   Contains https://github.com/apache/storm/pull/2996, please review that first.
   
   This issue is caused by a race. When Nimbus assigns executors to a 
supervisor, it writes them into Zookeeper. The supervisor is also told via 
Thrift (in local mode via a handle directly to Nimbus). When the worker starts 
and initializes WorkerState, it normally uses Thrift to ask the supervisor 
about the assignment. If that fails it tries Zookeeper. The attempt to read 
from Zookeeper is just a best-effort attempt to start the worker, it is 
inherently racy.
   
   In local mode, the supervisor Thrift server isn't running, so the 
WorkerState just skips straight to reading from Zookeeper. Since the supervisor 
is told by Nimbus about assignment directly, there is no guarantee that the 
assignment becomes visible in Zookeeper before the WorkerState tries reading 
it. This causes the NPE.
   
   The fix is to provide the Worker with a handle to the supervisor Thrift 
interface in local mode, similar to how the supervisor gets a handle to Nimbus. 
WorkerState can then ask the supervisor directly about the assignment. 
   
   I ran the tests in TestPlanCompiler (which runs a local cluster) a few 
hundred times, and they appear stable with this fix and STORM-3376. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to