tanganellilore commented on issue #8918:
URL: https://github.com/apache/cloudstack/issues/8918#issuecomment-2304416773

   Hi team,
   same issue on my test case.
   If I simulate a disruption with powering off the server, idrac command fail.
   
   If i simulate a disruption like power off machine and idrac not reachable, 
idrac command fail.
   
   In both case host remain in "fancing" ad libitum until i restart the server 
or idrac will be reachable.
   All vm on this host remain on this host in all tests, until i degraded node 
manually from UI.
   
   OOBM command in a powerd off host, return an exit code 1 (like reset for 
recovery state).
   
   I read some of your code and i think that error can be on these two pices:
   
https://github.com/apache/cloudstack/blob/b215abc30a22d6b11f016b8f402981445140f577/server/src/main/java/org/apache/cloudstack/ha/HAManagerImpl.java#L523-L529
   and
   
https://github.com/apache/cloudstack/blob/b215abc30a22d6b11f016b8f402981445140f577/server/src/main/java/org/apache/cloudstack/ha/task/FenceTask.java#L48-L53
   
   because function return always true on `fancing` state and due to the fact 
that OOBM go in exception, result will be `false` and we not going out to the 
"loop" of fancing.
   
   One workaround could be introduce how may attempts we can wait until host 
change state to `inegilible` or `disabled`, like for recovery, or manage reset 
and powerOff OOBM error to avoid exception
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to