OmCheeLin opened a new pull request, #882:
URL: https://github.com/apache/skywalking-banyandb/pull/882

   # Internal TLS Certificate Dynamic Reloading Test Guide
   
   This document describes how to test the dynamic reloading of certificates 
for internal TLS communication between liaison nodes and data nodes.
   
   ## Overview
   
   The internal TLS dynamic reloading feature allows certificates used for 
internal TLS communication to be updated without restarting the service:
   
   - **Liaison nodes**: CA certificate (`--data-client-ca-cert`) used to verify 
data node server certificates, and server certificates (`--cert-file`, 
`--key-file`) used for serving clients
   - **Data nodes**: Server certificates (`--cert-file`, `--key-file`) used for 
serving liaison node clients
   
   This is similar to the external TLS certificate dynamic reloading feature 
implemented in #12862.
   
   ## Test Scenario
   
   ### Prerequisites
   
   1. Build the BanyanDB server:
      ```bash
      make build
      ```
   
   2. Prepare test certificates directory:
      ```bash
      mkdir -p test-internal-certs && cd test-internal-certs
      ```
   
   ### Step 1: Generate Initial CA and Server Certificates
   
   **Important**: The server certificate's Common Name (CN) or Subject 
Alternative Name (SAN) must match the hostname used when nodes register to 
etcd. If your nodes register with a hostname like `master01` or `node1`, the 
certificate must include that hostname. If you use `localhost`, ensure all 
connections use `localhost` as well.
   
   **Note**: Modern TLS implementations (including Go's TLS library used by 
gRPC) require Subject Alternative Name (SAN) extensions for proper hostname 
verification. The certificate generation steps below include SAN extensions.
   
   ```bash
   cd ~/test-internal-certs
   
   # Generate CA certificate
   openssl req -x509 -newkey rsa:2048 -keyout ca_key1.pem -out ca_cert1.pem 
-days 365 -nodes -subj "/CN=TestCA1"
   cp ca_cert1.pem ca_cert.pem
   
   # Generate server private key
   openssl genrsa -out server_key1.pem 2048
   
   # Create certificate configuration file (for SAN extension)
   # Replace 'master01' with your actual hostname
   cat > server_cert.conf <<EOF
   [req]
   distinguished_name = req_distinguished_name
   req_extensions = v3_req
   prompt = no
   
   [req_distinguished_name]
   CN = master01
   
   [v3_req]
   subjectAltName = @alt_names
   
   [alt_names]
   DNS.1 = master01
   DNS.2 = localhost
   IP.1 = 127.0.0.1
   EOF
   
   # Generate certificate signing request (with SAN extension)
   openssl req -new -key server_key1.pem -out server_csr1.pem \
     -subj "/CN=master01" \
     -config server_cert.conf
   
   # Sign certificate with CA (includes SAN extension)
   openssl x509 -req -in server_csr1.pem -CA ca_cert1.pem -CAkey ca_key1.pem \
     -CAcreateserial -out server_cert1.pem -days 365 \
     -extensions v3_req -extfile server_cert.conf
   
   # Copy to the filenames used by the servers
   cp server_cert1.pem server_cert.pem
   cp server_key1.pem server_key.pem
   
   # Verify the certificate
   echo "=== Check certificate Subject ==="
   openssl x509 -in server_cert.pem -text -noout | grep -A 2 "Subject:"
   
   echo "=== Check SAN extension ==="
   openssl x509 -in server_cert.pem -text -noout | grep -A 5 "Subject 
Alternative Name"
   
   echo "=== Verify CA can verify server certificate ==="
   openssl verify -CAfile ca_cert.pem server_cert.pem
   ```
   
   The verification should show:
   - Subject: CN = master01
   - Subject Alternative Name: DNS:master01, DNS:localhost, IP Address:127.0.0.1
   - Verification result: `server_cert.pem: OK`
   
   ### Step 2: Start Data Node with TLS
   
   ```bash
   CERTS_DIR=/home/ubuntu/test-internal-certs
   ./banyand/build/bin/dev/banyand-server data \
     --etcd-endpoints=http://127.0.0.1:2379 \
     --tls=true \
     --cert-file=$CERTS_DIR/server_cert.pem \
     --key-file=$CERTS_DIR/server_key.pem \
     --grpc-port=17912 \
     --http-port=17913 \
     --measure-root-path=/tmp/test-data-measure \
     --stream-root-path=/tmp/test-data-stream
   ```
   
   ### Step 3: Start Liaison Node with Internal TLS
   
   ```bash
   CERTS_DIR=/home/ubuntu/test-internal-certs
   ./banyand/build/bin/dev/banyand-server liaison \
     --etcd-endpoints=http://127.0.0.1:2379 \
     --data-client-tls \
     --data-client-ca-cert=$CERTS_DIR/ca_cert.pem \
     --grpc-port=17914 \
     --http-port=17915
   ```
   
   **Note**: The flag names are `--data-client-tls` and 
`--data-client-ca-cert`. The prefix "data" comes from the role of the target 
nodes (data nodes).
   
   ### Step 4: Verify Initial Connection
   
   Check the liaison node logs to verify that it successfully connected to the 
data node:
   
   ```bash
   # Look for messages like:
   # "new node is healthy, add it to active queue"
   # "Started CA certificate file monitoring"
   # "TLS file watcher loop started"
   ```
   
   If you see `"new node is healthy, add it to active queue"` in the logs, the 
connection is successful. If you see `"node is unhealthy"` or `"failed to 
re-connect to grpc server"`, check the troubleshooting section below.
   
   ### Step 5: Generate New Certificates
   
   When testing certificate reloading, generate new certificates with SAN 
extensions:
   
   ```bash
   cd ~/test-internal-certs
   
   # Generate new CA certificate
   openssl req -x509 -newkey rsa:2048 -keyout ca_key2.pem -out ca_cert2.pem 
-days 365 -nodes -subj "/CN=TestCA2"
   
   # Generate new server private key
   openssl genrsa -out server_key2.pem 2048
   
   # Create certificate configuration file for new certificate
   cat > server_cert2.conf <<EOF
   [req]
   distinguished_name = req_distinguished_name
   req_extensions = v3_req
   prompt = no
   
   [req_distinguished_name]
   CN = master01
   
   [v3_req]
   subjectAltName = @alt_names
   
   [alt_names]
   DNS.1 = master01
   DNS.2 = localhost
   IP.1 = 127.0.0.1
   EOF
   
   # Generate certificate signing request
   openssl req -new -key server_key2.pem -out server_csr2.pem \
     -subj "/CN=master01" \
     -config server_cert2.conf
   
   # Sign certificate with new CA
   openssl x509 -req -in server_csr2.pem -CA ca_cert2.pem -CAkey ca_key2.pem \
     -CAcreateserial -out server_cert2.pem -days 365 \
     -extensions v3_req -extfile server_cert2.conf
   
   # IMPORTANT: Update certificates in the correct order to avoid connection 
failures:
   # 1. First, update server certificates on data nodes (so they use 
certificates signed by new CA)
   # 2. Then, update CA certificate on liaison nodes (so they can verify the 
new server certificates)
   
   # Step 1: Update server cert and key on data node first
   # This should trigger reload on data node
   cp server_cert2.pem server_cert.pem && cp server_key2.pem server_key.pem
   
   # Wait a few seconds for data node to reload the server certificate
   sleep 2
   
   # Step 2: Update CA cert file on liaison node
   # This should trigger reload and reconnection on liaison for client 
connections
   cp ca_cert2.pem ca_cert.pem
   ```
   
   **Note**: 
   - **Certificate Update Order**: When updating both CA and server 
certificates, you must update the server certificates **first**, then update 
the CA certificate. This ensures that when the liaison node reconnects with the 
new CA certificate, the data node is already using a server certificate signed 
by that new CA. If you update the CA certificate first, the liaison will try to 
reconnect but fail because the data node is still using a certificate signed by 
the old CA.
   - When updating server certificates, you need to update them on both liaison 
and data nodes if they are using the same certificate files. The reloader will 
automatically detect the changes and reload the certificates.
   
   ### Step 6: Verify Certificate Reload
   
   Wait a few seconds (the reloader has a 500ms debounce), then check the logs:
   
   **Liaison node logs:**
   ```bash
   # For CA certificate reload:
   # "CA certificate updated, reconnecting clients"
   # "successfully reconnected client after CA certificate update"
   
   # For server certificate reload:
   # "Successfully updated TLS certificate after content change"
   # "TLS certificate updated in memory"
   # "Starting TLS file monitoring"
   ```
   
   **Data node logs:**
   ```bash
   # Look for messages like:
   # "Successfully updated TLS certificate after content change"
   # "TLS certificate updated in memory"
   # "Started TLS file monitoring for queue server"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to