bokjo commented on issue #11383: URL: https://github.com/apache/apisix/issues/11383#issuecomment-2771957718
> I'm experiencing this issue with the latest chart on GKE. > > I don't have a permanent solution, but this quick fix solves it temporarily: > > 1. Scale the stateful set down to 2 > 2. Delete the PVC data-apisix-etcd-X, where X is the number of the etcd failing > 3. Scale back up to 3 > > Edit: Doing a helm uninstall, deleting the PVCs and reinstalling also fixes the issue for me. I installed etcd seperately as described [here](https://github.com/apache/apisix/issues/11338#issuecomment-2552935507) to do this without taking down apisix. This no longer works... after deletion the pod is just stuck into starting ETCD and does nothing, while using high CPU. New workaround: Even while the broken ETCD pod is running clear the member `snap` and `wal` values from the monted PV and kill/restart the pod! Walkthrough for GKE... not ideal but works (APISIX ETCD is just too flaky... it gets corrupted even on ApiSix resources change...) ```sh # ssh into the node the pod is running and PVC mounted # sudo su - # if you need to... mount -l | grep pvc-e5bc8149-... cd /var/lib/kubelet/pods/97496c02-.../volumes/kubernetes.io~csi/pvc-e5bc8149-.../mount cd data/member rm -rf snap rm -rf wal # restart the pod # `member_id` and `member_id` should appear in `../mouint/data` and pod starts successfully ``` Hope this helps! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
