tanganellilore commented on issue #8918: URL: https://github.com/apache/cloudstack/issues/8918#issuecomment-2304416773
Hi team, same issue on my test case. If I simulate a disruption with powering off the server, idrac command fail. If i simulate a disruption like power off machine and idrac not reachable, idrac command fail. In both case host remain in "fancing" ad libitum until i restart the server or idrac will be reachable. All vm on this host remain on this host in all tests, until i degraded node manually from UI. OOBM command in a powerd off host, return an exit code 1 (like reset for recovery state). I read some of your code and i think that error can be on these two pices: https://github.com/apache/cloudstack/blob/b215abc30a22d6b11f016b8f402981445140f577/server/src/main/java/org/apache/cloudstack/ha/HAManagerImpl.java#L523-L529 and https://github.com/apache/cloudstack/blob/b215abc30a22d6b11f016b8f402981445140f577/server/src/main/java/org/apache/cloudstack/ha/task/FenceTask.java#L48-L53 because function return always true on `fancing` state and due to the fact that OOBM go in exception, result will be `false` and we not going out to the "loop" of fancing. One workaround could be introduce how may attempts we can wait until host change state to `inegilible` or `disabled`, like for recovery, or manage reset and powerOff OOBM error to avoid exception -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org