devops-42 opened a new issue, #9959:
URL: https://github.com/apache/cloudstack/issues/9959

   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and main branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
    * Bug Report
   
   ##### COMPONENT NAME
   ~~~
   API / Backend
   ~~~
   
   ##### CLOUDSTACK VERSION
   ~~~
   4.19.1.x
   ~~~
   
   ##### CONFIGURATION
   Issue was with both: basic and advanced networking. The configuration is 
shown below.
   
   ##### OS / ENVIRONMENT
   Used setup:
   Management + KVM host:
   * OS: Ubuntu (jammy)
   * Network: both use the same CIDR
   
   Management:
   * serves MySQL
   * serves NFS shares for primary and secondary storage
    
   KVM host:
   * cloudbr0 configured 
   * Agent installed
   * Joined via public ssh key (root)
   
   ##### SUMMARY
   When setting up a zone (basic or advanced) the KVM host has joined to the 
cluster, but the SSVM and the CPVM stuck in "Starting". The log file of the 
SSVM shows a SSL error:
   ```
   2024-11-20 23:54:22,618 WARN  [cloud.agent.Agent] (main:null) NIO Connection 
Exception  com.cloud.utils.exception.NioConnectionException: SSL Handshake 
failed while connecting to host: **.**.**.** port: 8250
   ```
   The same log indicates, that the cloud agent on the SSVM was not able to 
detect the keystore:
   ```
   2024-11-20 23:54:21,927 WARN  [utils.nio.Link] (main:null) Failed to load 
keystore, using trust all manager
   ```
   After playing around I found out, that the cloud agent expects to have a 
keystore `cloud.jks` in the `/usr/local/cloud/systemvm/conf` directory, which 
is populated from the `/etc/cloudstack` directory. Unfortunately, 
`/etc/cloudstack` is empty on the VM.
   
   Already tried to work around by setting the global configuration parameter 
`ca.plugin.root.auth.strictness` to `false` (not really working for me, but 
with unexpected results):
   * The Agent state of the system VM's turned immediately to `Up`, while the 
overall state remains on `Starting`
   * After restarting cloudstack management and/or cloudstack agent, but status 
where `Up`. 
   * Creating a compute instance with a guest network is not possible, the 
virtual router instance aborts with an error state (possibly due to the same 
keystore issue)
   
   ##### STEPS TO REPRODUCE
   
   Setup management server:
   ~~~
   apt-get install -y \
     apt-transport-https \
     bridge-utils \
     ca-certificates \
     curl \
     chrony \
     gnupg \
     lsb-release \
     mysql-server \
     net-tools \
     nfs-kernel-server \
     quota \
     software-properties-common \
     unattended-upgrades
   
   cat <<'EOF' > /etc/mysql/mysql.conf.d/cloudstack.cnf
   [mysqld]
   server-id=1
   innodb_rollback_on_timeout=1
   innodb_lock_wait_timeout=600
   max_connections=350
   log-bin=mysql-bin
   binlog-format = 'ROW'
   EOF
   systemctl restart mysql
   
   wget -O - https://download.cloudstack.org/release.asc | tee 
/etc/apt/trusted.gpg.d/cloudstack.asc
   echo "deb https://download.cloudstack.org/ubuntu noble 4.19" | tee 
/etc/apt/sources.list.d/cloudstack.list
   apt-get update
   apt-get install -y cloudstack-management
   
   mkdir -p /export/primary /export/secondary
   echo "/export  *(rw,async,no_root_squash,no_subtree_check,insecure)" >> 
/etc/exports
   exportfs -a
   sed -i -e 's/^RPCMOUNTDOPTS="--manage-gids"$/RPCMOUNTDOPTS="-p 892 
--manage-gids"/g' /etc/default/nfs-kernel-server
   sed -i -e 's/^STATDOPTS=$/STATDOPTS="--port 662 --outgoing-port 2020"/g' 
/etc/default/nfs-common
   echo "NEED_STATD=yes" >> /etc/default/nfs-common
   sed -i -e 's/^RPCRQUOTADOPTS=$/RPCRQUOTADOPTS="-p 875"/g' /etc/default/quota
   service nfs-kernel-server restart
   
   cloudstack-setup-databases ***:***@localhost --deploy-as=root -i 127.0.0.1
   cloudstack-setup-management
   ~~~
   
   Setup KVM host:
   ~~~
   apt-get install -y \
     apt-transport-https \
     bridge-utils \
     ca-certificates \
     curl \
     chrony \
     gnupg \
     lsb-release \
     net-tools \
     quota \
     software-properties-common \
     unattended-upgrades
   
   cat <<'EOM' > /etc/netplan/01-netcfg.yaml
   network:
     version: 2
     ethernets:
       eth0: {}
     bridges:
       cloudbr0:
         addresses:
           - **.**.**.**/**
         nameservers:
           addresses:
             - **.**.**.**
         routes:
           - to: default
             via: **.**.**.**
             metric: 100
         interfaces: [eth0]
   EOM
   chmod 600 /etc/netplan/01-netcfg.yaml
   mv /etc/netplan/50-cloud-init.yaml /etc/netplan/50-cloud-init.yaml.dist
   netplan generate && netplan apply
   
   wget -O - https://download.cloudstack.org/release.asc | tee 
/etc/apt/trusted.gpg.d/cloudstack.asc
   echo "deb https://download.cloudstack.org/ubuntu noble 4.19" | tee 
/etc/apt/sources.list.d/cloudstack.list
   apt-get update
   apt-get install -y qemu-kvm cloudstack-agent
   
   sed -i -e 's/\#vnc_listen.*$/vnc_listen = "0.0.0.0"/g' /etc/libvirt/qemu.conf
   systemctl mask libvirtd.socket libvirtd-ro.socket libvirtd-admin.socket 
libvirtd-tls.socket libvirtd-tcp.socket
   systemctl restart libvirtd
   
   mv /etc/libvirt/libvirtd.conf /etc/libvirt/libvirtd.conf.dist
   cat <<'EOM' > /etc/libvirt/libvirtd.conf
   listen_tls=0
   listen_tcp=0
   tcp_port = "16509"
   mdns_adv = 0
   auth_tcp = "none"
   EOM
   
   systemctl restart libvirtd
   
   modprobe br_netfilter
   echo 'net.bridge.bridge-nf-call-arptables = 0' >> /etc/sysctl.conf
   echo 'net.bridge.bridge-nf-call-iptables = 0' >> /etc/sysctl.conf
   echo 'net.bridge.bridge-nf-call-ip6tables = 0' >> /etc/sysctl.conf
   sysctl -p
   ~~~
   
   * Copy SSH key of the cloudstack management to the KVM host.
   * Wait, until management is ready
   * Login in, create an advanced zone using the gateway from the cloudstack 
subnet and assign reserved IP ranges to pod and for public traffic. 
   * Create primary/secondary storage, join host using SSH key and root account.
   * Enable zone
   * Navigate to SystemVM below Infrastructure menu and see 2 VMs in starting 
mode
   
   
   ##### EXPECTED RESULTS
   
   SSVM and CPVM starting up, cloud agent is running. Creation of compute 
instances using virtual router (isolated guest network) is possible.
   
   ##### ACTUAL RESULTS
   
   Here the (hopefully) relevant log snippet.
   ~~~
   2024-11-20 23:54:21,734 INFO  [cloud.agent.Agent] (main:null) Agent [id = 
new : type = PremiumSecondaryStorageResource : zone = 1 : pod = 1 : workers = 5 
: host = **.**.**.** : port = 8250
   2024-11-20 23:54:21,809 INFO  [utils.nio.NioClient] (main:null) Connecting 
to **.**.**.**:8250
   2024-11-20 23:54:21,828 INFO  [utils.nio.Link] (main:null) Conf file found: 
/usr/local/cloud/systemvm/conf/agent.properties
   2024-11-20 23:54:21,927 WARN  [utils.nio.Link] (main:null) Failed to load 
keystore, using trust all manager
   2024-11-20 23:54:22,597 ERROR [utils.nio.Link] (main:null) SSL error caught 
during unwrap data: Received fatal alert: bad_certificate, for local 
address=/**.**.**.**:43322, remote address=/**.**.**.**:8250. The client may 
have invalid ca-certificates.
   2024-11-20 23:54:22,602 ERROR [utils.nio.NioClient] (main:null) SSL 
Handshake failed while connecting to host: **.**.**.** port: 8250
   2024-11-20 23:54:22,604 ERROR [utils.nio.NioConnection] (main:null) Unable 
to initialize the threads.
   java.io.IOException: SSL Handshake failed while connecting to host: 
**.**.**.** port: 8250
           at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
           at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
           at com.cloud.agent.Agent.start(Agent.java:286)
           at com.cloud.agent.AgentShell.launchNewAgent(AgentShell.java:454)
           at 
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:431)
           at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:415)
           at com.cloud.agent.AgentShell.start(AgentShell.java:511)
           at com.cloud.agent.AgentShell.main(AgentShell.java:541)
   2024-11-20 23:54:22,618 WARN  [cloud.agent.Agent] (main:null) NIO Connection 
Exception  com.cloud.utils.exception.NioConnectionException: SSL Handshake 
failed while connecting to host: **.**.**.** port: 8250
   2024-11-20 23:54:22,618 INFO  [cloud.agent.Agent] (main:null) Attempted to 
connect to the server, but received an unexpected exception, trying again...
   ~~~
   
   Thanks for looking at it✌️


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to