devmadhuu commented on PR #9472:
URL: https://github.com/apache/ozone/pull/9472#issuecomment-3695400905

   > There is something about storing the health state in ContainerInfo that 
doesn't feel correct to me. The state is captured at a point in time and then 
its stale soon afterwards. It doesn't get updated until the next run of RM. The 
thing about the report object, was that it captured the stats of a complete run 
of RM, but in the new way, the container infos get changed as RM runs. I guess 
it should be pretty fast, but it could lead to kind of unstable numbers.
   > 
   > I cannot really come up with a concrete reason as to why I think using 
container Info for this is wrong, aside from it only being updated by RM with 
its periodic runs. Perhaps I am over thinking it and its fine.
   > 
   > I think there are also some places that use RM in a read only mode to 
check container states (decommission maybe), so that may update the 
containerInfo states between RM runs. I am not sure if that is a problem or 
not. Probably not as it can only make the state more current.
   > 
   > Aside from the above scanning the PR quickly, the thing I am not sure 
about is multiplying up the states - like under_replicated, 
unhealthy_under_replicate, qc_under_replicated ... It leads to a lot more 
states that may just be more confusing than helpful.
   > 
   > In the RM report, we tried to only have a container in a single state, but 
it can be unhealthy and under / over replicated. It can be missing and 
under-replicated I think. Missing is kind of an extreme version of 
under-replicated. The only way to capture these "double states" with a single 
field is to multiple up the states I guess.
   
   Yes , with this new way, ContainerInfo object will hold its state (multiple 
or just single) and there is a possibility of changing it between two different 
RM runs. But in real time, that may be good also as it will reflect the current 
state even in read only mode. Could you please summarize your points to get 
better understanding how the current behavior may be an issue, As per my 
understanding, that should not be an issue, but still looking for deeper 
understanding from your contextual thinking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to