bokjo commented on issue #11383:
URL: https://github.com/apache/apisix/issues/11383#issuecomment-2771957718

   > I'm experiencing this issue with the latest chart on GKE.
   > 
   > I don't have a permanent solution, but this quick fix solves it 
temporarily:
   > 
   > 1. Scale the stateful set down to 2
   > 2. Delete the PVC data-apisix-etcd-X, where X is the number of the etcd 
failing
   > 3. Scale back up to 3
   > 
   > Edit: Doing a helm uninstall, deleting the PVCs and reinstalling also 
fixes the issue for me. I installed etcd seperately as described 
[here](https://github.com/apache/apisix/issues/11338#issuecomment-2552935507) 
to do this without taking down apisix.
   
   This no longer works... after deletion the pod is just stuck into starting 
ETCD and does nothing, while using high CPU.
   
   
   New workaround: 
   
   Even while the broken ETCD pod is running clear the member `snap` and `wal` 
values from the monted PV and kill/restart the pod!
   
   Walkthrough for GKE... not ideal but works (APISIX ETCD is just too flaky... 
it gets corrupted even on ApiSix resources change...)
   
   ```sh
   
   # ssh into the node the pod is running and PVC mounted
   # sudo su - # if you need to...
   
   mount -l | grep pvc-e5bc8149-...
   
   cd 
/var/lib/kubelet/pods/97496c02-.../volumes/kubernetes.io~csi/pvc-e5bc8149-.../mount
   
   cd data/member
   
   rm -rf snap
   rm -rf wal
   
   # restart the pod
   # `member_id` and `member_id` should appear in `../mouint/data` and pod 
starts successfully
   
   ```
   
   Hope this helps!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to