[GitHub] [apisix] MirtoBusico opened a new issue, #8566: help request: how to recover a corrupted etcd after a crash?

GitBox Fri, 23 Dec 2022 09:30:33 -0800


MirtoBusico opened a new issue, #8566:
URL: https://github.com/apache/apisix/issues/8566


   ### Description
   
   Hi all,
   I have a 3 worker node (plus 1 master) K3S cluster with Apisix 2.15.1 
installed as LoadBalancer using the helm chart
   
   Every node is a KVM virtual machine on the same host.
   
   After an host crash the three etcs pods never go online.
   
   Looking at the first etcd pod (apisix-etcd-0) logs I see
   ```
   
{"level":"warn","ts":"2022-12-23T17:15:46.357Z","caller":"flags/flag.go:93","msg":"unrecognized
 environment variable","environment-variable":"ETCD_DAEMON_USER=etcd"}
   
{"level":"info","ts":"2022-12-23T17:15:46.357Z","caller":"etcdmain/etcd.go:73","msg":"Running:
 ","args":["etcd"]}
   
{"level":"warn","ts":"2022-12-23T17:15:46.357Z","caller":"etcdmain/etcd.go:446","msg":"found
 invalid file under data 
directory","filename":"member_id","data-dir":"/bitnami/etcd/data"}
   
{"level":"info","ts":"2022-12-23T17:15:46.357Z","caller":"etcdmain/etcd.go:116","msg":"server
 has been already 
initialized","data-dir":"/bitnami/etcd/data","dir-type":"member"}
   
{"level":"info","ts":"2022-12-23T17:15:46.357Z","caller":"embed/etcd.go:131","msg":"configuring
 peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}
   
{"level":"info","ts":"2022-12-23T17:15:46.357Z","caller":"embed/etcd.go:139","msg":"configuring
 client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
   
{"level":"info","ts":"2022-12-23T17:15:46.358Z","caller":"embed/etcd.go:308","msg":"starting
 an etcd 
server","etcd-version":"3.5.4","git-sha":"08407ff76","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":6,"max-cpu-available":6,"member-initialized":true,"name":"apisix-etcd-0","data-dir":"/bitnami/etcd/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/bitnami/etcd/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2379","http://apisix-etcd.apisix.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial
 
-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
   
{"level":"info","ts":"2022-12-23T17:15:46.358Z","caller":"etcdserver/backend.go:81","msg":"opened
 backend db","path":"/bitnami/etcd/data/member/snap/db","took":"159.119µs"}
   
{"level":"info","ts":"2022-12-23T17:15:46.473Z","caller":"etcdserver/server.go:508","msg":"recovered
 v2 store from snapshot","snapshot-index":200002,"snapshot-size":"26 kB"}
   
{"level":"warn","ts":"2022-12-23T17:15:46.474Z","caller":"snap/db.go:88","msg":"failed
 to find 
[SNAPSHOT-INDEX].snap.db","snapshot-index":200002,"snapshot-file-path":"/bitnami/etcd/data/member/snap/0000000000030d42.snap.db","error":"snap:
 snapshot file doesn't exist"}
   
{"level":"panic","ts":"2022-12-23T17:15:46.474Z","caller":"etcdserver/server.go:515","msg":"failed
 to recover v3 backend from snapshot","error":"failed to find database snapshot 
file (snap: snapshot file doesn't 
exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdserver/server.go:515\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/go/src/go.etcd.io/etcd/release/etcd/server/embed/etcd.go:245\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"}
   panic: failed to recover v3 backend from snapshot
   goroutine 1 [running]:
   ```
   
   How can I recover etcd?
   Also recreating an empty etcd is god
   
   
   
   
   
   
   
   
   ### Environment
   
   
   -    APISIX version (run apisix version):
   
   root@apisix-64fffcfb4c-55vhw:/usr/local/apisix# apisix version
   /usr/local/openresty/luajit/bin/luajit ./apisix/cli/apisix.lua version
   2.15.1
   root@apisix-64fffcfb4c-55vhw:/usr/local/apisix#
   
   -    Operating system (run uname -a):
   
   root@apisix-64fffcfb4c-55vhw:/usr/local/apisix# uname -a
   Linux apisix-64fffcfb4c-55vhw 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 
18:53:30 UTC 2022 x86_64 GNU/Linux
   root@apisix-64fffcfb4c-55vhw:/usr/local/apisix# 
   
   -    OpenResty / Nginx version (run openresty -V or nginx -V):
   -    etcd version, if relevant (run curl 
http://127.0.0.1:9090/v1/server_info):
   -    APISIX Dashboard version, if relevant: 2.13.0
   -    Plugin runner version, for issues related to plugin runners:
   -    LuaRocks version, for installation issues (run luarocks --version):
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [apisix] MirtoBusico opened a new issue, #8566: help request: how to recover a corrupted etcd after a crash?

Reply via email to