kerneltime commented on PR #7882: URL: https://github.com/apache/ozone/pull/7882#issuecomment-2683544975
> > The original issue was a replica showing up with a zero BCSID causing the heartbeat to not get processed. The equality pre condition was not covering some legitimate scenarios. That said if a replica shows up with a higher BCSID should the container state not be updated with the higher BCSID. I am ok with this change in itself but it is not complete in terms of handling variations in BCSIDs across replicas. > > I am guessing this PR is only to address SCM not processing remaining containers in the heartbeat and not about dealing with varying BCSIDs causing crashes. > > @kerneltime Yes this jira only deals with the consequence of bad bcsId but not solving it at the root cause. The root cause is addressed in [HDDS-12232](https://issues.apache.org/jira/browse/HDDS-12232). However, if a cluster already suffers from bad bcsId replicas. It needs this patch to get out of it (ignores those bad replicas without crashing the entire container report handler. Note the bad bcsId replicas can't fix themselves). Ack. Also https://issues.apache.org/jira/browse/HDDS-12171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
