Re: [PR] HDDS-12150. Abnormal container states should not crash the SCM ContainerReportHandler thread [ozone]

via GitHub Tue, 25 Feb 2025 15:49:51 -0800


kerneltime commented on PR #7882:
URL: https://github.com/apache/ozone/pull/7882#issuecomment-2683544975


   > > The original issue was a replica showing up with a zero BCSID causing 
the heartbeat to not get processed. The equality pre condition was not covering 
some legitimate scenarios. That said if a replica shows up with a higher BCSID 
should the container state not be updated with the higher BCSID. I am ok with 
this change in itself but it is not complete in terms of handling variations in 
BCSIDs across replicas.
   > > I am guessing this PR is only to address SCM not processing remaining 
containers in the heartbeat and not about dealing with varying BCSIDs causing 
crashes.
   > 
   > @kerneltime Yes this jira only deals with the consequence of bad bcsId but 
not solving it at the root cause. The root cause is addressed in 
[HDDS-12232](https://issues.apache.org/jira/browse/HDDS-12232). However, if a 
cluster already suffers from bad bcsId replicas. It needs this patch to get out 
of it (ignores those bad replicas without crashing the entire container report 
handler. Note the bad bcsId replicas can't fix themselves).
   
   Ack. Also https://issues.apache.org/jira/browse/HDDS-12171


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-12150. Abnormal container states should not crash the SCM ContainerReportHandler thread [ozone]

Reply via email to