ptlrs opened a new pull request, #9997:
URL: https://github.com/apache/ozone/pull/9997

   ## What changes were proposed in this pull request?
   The `XceiverClientGrpc#connectToDatanode` intermittently fails with an NPE. 
   The problem is that for a given datanode, there is a race condition between 
creating a channel and creating a stub.
   
   When a new channel is created for a DN, it is put into the `channels` map. 
However, presence of a channel in the map does not imply that the corresponding 
stub for the same DN also exists in the asyncStubs map. 
   
   If the stub is accessed after creating a channel but before the creation of 
stub, we can get an NPE.
   
   This PR fixes the problem by:
   - maintaining only one `dnChannelInfoMap` for both the channels and stubs 
instead of two independent maps
   - creating a `ChannelInfo` class to group the channel and stub
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-14793
   
   ## How was this patch tested?
   
   CI: https://github.com/ptlrs/ozone/actions/runs/23703558972


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to