weizhouapache commented on issue #7734: URL: https://github.com/apache/cloudstack/issues/7734#issuecomment-1647128114
> Hi @kiranchavala, Looks like I'm also having the same issue. I've got a single node for both management & hypervisor on a single node running Ubuntu Server 22.04 LTS, advanced zone for networking and CS version 4.18.0.0. I've got the required networking for k8s to work, management network will be able to SSH into control & worker nodes. No matter I choose k8s control plane HA or otherwise, a few nodes end up creating `success` file on `~cloud` whereas, not created on others. And the systemd service appears to be failing. > > Below are the results of a non-HA control plane. I've got 1 master and a worker node > > ``` > cloud@cks-ha-control-18980b116ac:~$ systemctl status deploy-kube-system.service > ● deploy-kube-system.service > Loaded: loaded (/etc/systemd/system/deploy-kube-system.service; static) > Active: active (running) since Sun 2023-07-23 03:13:38 UTC; 278ms ago > Main PID: 115110 (deploy-kube-sys) > Tasks: 9 (limit: 4560) > Memory: 11.7M > CPU: 351ms > CGroup: /system.slice/deploy-kube-system.service > ├─115110 /bin/bash -e /opt/bin/deploy-kube-system > ├─115164 kubeadm init --token 48a920.5061c949f595b0c2 --token-ttl 0 --control-plane-endpoint 10.231.11.233:6443 --upload-certs --certificate-key 48a9205061c949f595b0c235f311672948a9205061c94> > └─115175 systemctl is-active firewalld > cloud@cks-ha-control-18980b116ac:~$ systemctl status deploy-kube-system.service > ● deploy-kube-system.service > Loaded: loaded (/etc/systemd/system/deploy-kube-system.service; static) > Active: active (running) since Sun 2023-07-23 03:13:39 UTC; 295ms ago > Main PID: 115352 (deploy-kube-sys) > Tasks: 14 (limit: 4560) > Memory: 22.6M > CPU: 367ms > CGroup: /system.slice/deploy-kube-system.service > ├─115352 /bin/bash -e /opt/bin/deploy-kube-system > ├─115408 kubeadm init --token 48a920.5061c949f595b0c2 --token-ttl 0 --control-plane-endpoint 10.231.11.233:6443 --upload-certs --certificate-key 48a9205061c949f595b0c235f311672948a9205061c94> > └─115425 kubelet --version > cloud@cks-ha-control-18980b116ac:~$ systemctl status deploy-kube-system.service > ● deploy-kube-system.service > Loaded: loaded (/etc/systemd/system/deploy-kube-system.service; static) > Active: activating (auto-restart) (Result: exit-code) since Sun 2023-07-23 03:13:41 UTC; 40ms ago > Process: 115519 ExecStart=/opt/bin/deploy-kube-system (code=exited, status=1/FAILURE) > Main PID: 115519 (code=exited, status=1/FAILURE) > CPU: 440ms > ``` > > and the `daemon.log` on the master node where it is not ready, > > ``` > Jul 23 03:14:00 systemvm deploy-kube-system[118790]: W0723 03:14:00.934944 118790 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! > Jul 23 03:14:00 systemvm deploy-kube-system[118790]: [init] Using Kubernetes version: v1.24.0 > Jul 23 03:14:00 systemvm deploy-kube-system[118790]: [preflight] Running pre-flight checks > Jul 23 03:14:00 systemvm deploy-kube-system[118790]: #011[WARNING SystemVerification]: missing optional cgroups: blkio > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: error execution phase preflight: [preflight] Some fatal errors occurred: > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR Port-6443]: Port 6443 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR Port-10259]: Port 10259 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR Port-10257]: Port 10257 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR Port-10250]: Port 10250 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR Port-2379]: Port 2379 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR Port-2380]: Port 2380 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: #011[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` > Jul 23 03:14:01 systemvm deploy-kube-system[118790]: To see the stack trace of this error execute with --v=5 or higher > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: W0723 03:14:01.047480 118817 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: [init] Using Kubernetes version: v1.24.0 > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: [preflight] Running pre-flight checks > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[WARNING SystemVerification]: missing optional cgroups: blkio > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: error execution phase preflight: [preflight] Some fatal errors occurred: > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR Port-6443]: Port 6443 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR Port-10259]: Port 10259 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR Port-10257]: Port 10257 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR Port-10250]: Port 10250 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR Port-2379]: Port 2379 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR Port-2380]: Port 2380 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: #011[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` > Jul 23 03:14:01 systemvm deploy-kube-system[118817]: To see the stack trace of this error execute with --v=5 or higher > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: W0723 03:14:01.162516 118844 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: [init] Using Kubernetes version: v1.24.0 > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: [preflight] Running pre-flight checks > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[WARNING SystemVerification]: missing optional cgroups: blkio > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: error execution phase preflight: [preflight] Some fatal errors occurred: > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR Port-6443]: Port 6443 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR Port-10259]: Port 10259 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR Port-10257]: Port 10257 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR Port-10250]: Port 10250 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR Port-2379]: Port 2379 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR Port-2380]: Port 2380 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: #011[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` > Jul 23 03:14:01 systemvm deploy-kube-system[118844]: To see the stack trace of this error execute with --v=5 or higher > Jul 23 03:14:01 systemvm deploy-kube-system[118789]: Error: kubeadm init failed! > Jul 23 03:14:01 systemvm systemd[1]: deploy-kube-system.service: Main process exited, code=exited, status=1/FAILURE > Jul 23 03:14:01 systemvm systemd[1]: deploy-kube-system.service: Failed with result 'exit-code'. > Jul 23 03:14:01 systemvm systemd[1]: deploy-kube-system.service: Scheduled restart job, restart counter is at 1429. > Jul 23 03:14:01 systemvm systemd[1]: Stopped deploy-kube-system.service. > Jul 23 03:14:01 systemvm systemd[1]: Started deploy-kube-system.service. > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: W0723 03:14:01.439074 118873 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: [init] Using Kubernetes version: v1.24.0 > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: [preflight] Running pre-flight checks > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[WARNING SystemVerification]: missing optional cgroups: blkio > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: error execution phase preflight: [preflight] Some fatal errors occurred: > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR Port-6443]: Port 6443 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR Port-10259]: Port 10259 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR Port-10257]: Port 10257 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR Port-10250]: Port 10250 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR Port-2379]: Port 2379 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR Port-2380]: Port 2380 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: #011[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` > Jul 23 03:14:01 systemvm deploy-kube-system[118873]: To see the stack trace of this error execute with --v=5 or higher > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: W0723 03:14:01.552586 118899 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: [init] Using Kubernetes version: v1.24.0 > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: [preflight] Running pre-flight checks > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[WARNING SystemVerification]: missing optional cgroups: blkio > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: error execution phase preflight: [preflight] Some fatal errors occurred: > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR Port-6443]: Port 6443 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR Port-10259]: Port 10259 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR Port-10257]: Port 10257 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR Port-10250]: Port 10250 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR Port-2379]: Port 2379 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR Port-2380]: Port 2380 is in use > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: #011[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` > Jul 23 03:14:01 systemvm deploy-kube-system[118899]: To see the stack trace of this error execute with --v=5 or higher > Jul 23 03:14:01 systemvm deploy-kube-system[118925]: W0723 03:14:01.666300 118925 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! > Jul 23 03:14:01 systemvm deploy-kube-system[118925]: [init] Using Kubernetes version: v1.24.0 > ``` > > ``` > # kubectl --kubeconfig /etc/kubernetes/admin.conf get no > NAME STATUS ROLES AGE VERSION > cks-ha-control-18980b116ac NotReady <none> 23m v1.24.0 > ``` > > Please ignore the `ha`. > >  > > Thanks @zap51 I tested v1.24.0 last week, it worked well. can you share the output of `journalctl -xu deploy-kube-system` in the control node ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
