cfmcgrady opened a new pull request, #2566: URL: https://github.com/apache/celeborn/pull/2566
### What changes were proposed in this pull request? backport https://github.com/apache/celeborn/pull/2548 to branch-0.4 Fix the thread safety bug in getMetrics of AbstractSource by changing the lock scope ### Why are the changes needed? When two threads access the getMetrics method in AbstractSource at the same time, one of the threads may get fewer metrics than the actual value, because the actual execution order may be like this: Thread A gets the lock, adds the metrics of the worker source to the innerMetrics queue and releases the lock, Thread B gets the lock, adds the metrics of the worker source to the innerMetrics queue and releases the lock, Thread A gets the lock, adds the metrics of other sources to the innerMetrics queue, assembles the values of innerMetrics, clears innerMetrics and releases the lock, Thread B gets the lock, adds the metrics of other sources to the innerMetrics queue, assembles the values of innerMetrics, clears innerMetrics and releases the lock. The result of this is that Thread A gets two sets of metrics data from the worker source, while Thread B doesn't get any. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manual test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
