adoroszlai commented on code in PR #4943:
URL: https://github.com/apache/ozone/pull/4943#discussion_r1250278996
##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509KeyManager.java:
##########
@@ -85,15 +89,65 @@ public ReloadingX509KeyManager(String type,
CertificateClient caClient)
@Override
public String chooseEngineClientAlias(String[] strings,
Principal[] principals, SSLEngine sslEngine) {
- return keyManagerRef.get()
+ String ret = keyManagerRef.get()
.chooseEngineClientAlias(strings, principals, sslEngine);
+
+ if (ret == null) {
+ /*
+ Workaround to address that netty tc-native cannot handle the dynamic
+ key and certificate refresh well. What happens is during the setup of
+ the grpc channel, an SSLContext is created, which is
+ ReferenceCountedOpenSslServerContext in the native tc-native case.
+ This class uses the TrustManager's getAcceptedIssuers() as the trusted
+ CA certificate list. The list is not updated after channel is built.
+ With the list being used to present the Principals during the mTLS
+ authentication via the Netty channel under Ratis implementation,
+ the counterpart(client) KeyManager's
+ chooseEngineClientAlias(String, Principal[], SSLEngine) method is
+ called with this old root certificate subject principal, which is now
+ not available in the new Key Manager after refreshed, so the method
+ will return null, which cause the mutual TLS connection establish
+ failure.
+
+ Example error message:
+ Engine client aliases for RSA, DH_RSA, EC, EC_RSA, EC_EC,
+ O=CID-f9f2b2cf-a784-49d7-8577-5d3b13bf0b46,
+ OU=9f52487c-f8f9-45ee-bb56-aca60b56327f,
+ [email protected],
+ org.apache.ratis.thirdparty.io.netty.handler.ssl.OpenSslEngine@5eec0d10
+ is null
+
+ Example success message:
+ Engine client aliases for RSA, DH_RSA, EC, EC_RSA, EC_EC,
+ O=CID-f9f2b2cf-a784-49d7-8577-5d3b13bf0b46,
+ OU=9f52487c-f8f9-45ee-bb56-aca60b56327f,
+ [email protected],
+ org.apache.ratis.thirdparty.io.netty.handler.ssl.OpenSslEngine@5eec0d10
+ is scm/sub-ca_key
+ */
+ ret = alias;
+ LOG.info("Engine client aliases for {}, {}, {} is returned as {}",
+ strings == null ? "" : Arrays.stream(strings).map(Object::toString)
+ .collect(Collectors.joining(", ")),
+ principals == null ? "" : Arrays.stream(principals)
+ .map(Object::toString).collect(Collectors.joining(", ")),
+ sslEngine == null ? "" : sslEngine.toString(), ret);
Review Comment:
```suggestion
strings == null ? "" : Arrays.toString(strings),
principals == null ? "" : Arrays.toString(principals),
sslEngine == null ? "" : sslEngine, ret);
```
Also, why do you prefer `""` instead of `null` in the log?
##########
hadoop-ozone/dist/src/main/compose/testlib.sh:
##########
@@ -323,15 +338,20 @@ wait_for_port(){
wait_for_execute_command(){
local container=$1
local timeout=$2
- local command=$3
+ shift 2
+ local command=$@
#Reset the timer
SECONDS=0
while [[ $SECONDS -lt $timeout ]]; do
- if docker-compose exec -T $container bash -c '$command'; then
- echo "$command succeed"
- return
+ set +e
+ docker-compose exec -T $container /bin/bash -c "$command"
+ status=$?
+ set -e
+ if [ $status -eq 0 ] ; then
+ echo "$command succeed"
+ return;
Review Comment:
This was recently changed, please avoid reverting to previous implementation.
```suggestion
if docker-compose exec -T $container /bin/bash -c "$command"; then
echo "$command succeed"
```
##########
hadoop-ozone/dist/src/main/compose/testlib.sh:
##########
@@ -262,6 +263,20 @@ execute_command_in_container(){
docker-compose exec -T "$@"
}
+## @description Execute specific commands in docker container
+## @param container name
+## @param specific commands to execute
+execute_commands_in_container(){
+ local container=$1
+ shift 1
+ local command=$@
+
+ set -e
+ # shellcheck disable=SC2068
+ docker-compose exec -T $container /bin/bash -c "$command"
+ set +e
Review Comment:
```suggestion
# shellcheck disable=SC2068
docker-compose exec -T $container /bin/bash -c "$command"
```
`execute_command_in_container` was recently changed to avoid changing
`exit-on-error` setting, please apply the same here.
##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/ssl/ReloadingX509KeyManager.java:
##########
@@ -85,15 +89,65 @@ public ReloadingX509KeyManager(String type,
CertificateClient caClient)
@Override
public String chooseEngineClientAlias(String[] strings,
Principal[] principals, SSLEngine sslEngine) {
- return keyManagerRef.get()
+ String ret = keyManagerRef.get()
.chooseEngineClientAlias(strings, principals, sslEngine);
+
+ if (ret == null) {
+ /*
+ Workaround to address that netty tc-native cannot handle the dynamic
+ key and certificate refresh well. What happens is during the setup of
+ the grpc channel, an SSLContext is created, which is
+ ReferenceCountedOpenSslServerContext in the native tc-native case.
+ This class uses the TrustManager's getAcceptedIssuers() as the trusted
+ CA certificate list. The list is not updated after channel is built.
+ With the list being used to present the Principals during the mTLS
+ authentication via the Netty channel under Ratis implementation,
+ the counterpart(client) KeyManager's
+ chooseEngineClientAlias(String, Principal[], SSLEngine) method is
+ called with this old root certificate subject principal, which is now
+ not available in the new Key Manager after refreshed, so the method
+ will return null, which cause the mutual TLS connection establish
+ failure.
+
+ Example error message:
+ Engine client aliases for RSA, DH_RSA, EC, EC_RSA, EC_EC,
+ O=CID-f9f2b2cf-a784-49d7-8577-5d3b13bf0b46,
+ OU=9f52487c-f8f9-45ee-bb56-aca60b56327f,
+ [email protected],
+ org.apache.ratis.thirdparty.io.netty.handler.ssl.OpenSslEngine@5eec0d10
+ is null
+
+ Example success message:
+ Engine client aliases for RSA, DH_RSA, EC, EC_RSA, EC_EC,
+ O=CID-f9f2b2cf-a784-49d7-8577-5d3b13bf0b46,
+ OU=9f52487c-f8f9-45ee-bb56-aca60b56327f,
+ [email protected],
+ org.apache.ratis.thirdparty.io.netty.handler.ssl.OpenSslEngine@5eec0d10
+ is scm/sub-ca_key
+ */
+ ret = alias;
+ LOG.info("Engine client aliases for {}, {}, {} is returned as {}",
+ strings == null ? "" : Arrays.stream(strings).map(Object::toString)
+ .collect(Collectors.joining(", ")),
+ principals == null ? "" : Arrays.stream(principals)
+ .map(Object::toString).collect(Collectors.joining(", ")),
+ sslEngine == null ? "" : sslEngine.toString(), ret);
+ }
+ return ret;
}
@Override
public String chooseEngineServerAlias(String s, Principal[] principals,
SSLEngine sslEngine) {
- return keyManagerRef.get()
+ String ret = keyManagerRef.get()
.chooseEngineServerAlias(s, principals, sslEngine);
+ if (ret == null && LOG.isDebugEnabled()) {
+ LOG.debug("Engine server aliases for {}, {}, {} is null", s,
+ principals == null ? "" : Arrays.stream(principals)
+ .map(Object::toString).collect(Collectors.joining(", ")),
+ sslEngine == null ? "" : sslEngine.toString());
Review Comment:
```suggestion
principals == null ? "" : Arrays.toString(principals),
sslEngine == null ? "" : sslEngine);
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/SCMSecurityProtocolServerSideTranslatorPB.java:
##########
@@ -253,7 +253,7 @@ public SCMGetCertResponseProto getSCMCertificate(
throw createNotHAException();
}
String certificate = impl.getSCMCertificate(request.getScmDetails(),
- request.getCSR());
+ request.getCSR(), request.hasRenew() ? request.getRenew() : false);
Review Comment:
Nit:
```suggestion
request.getCSR(), request.hasRenew() && request.getRenew());
```
##########
hadoop-ozone/dist/src/main/compose/ozonesecure-ha/test-root-ca-rotation.sh:
##########
@@ -0,0 +1,93 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#suite:HA-secure
+
+COMPOSE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+export COMPOSE_DIR
+
+export SECURITY_ENABLED=true
+export OM_SERVICE_ID="omservice"
+export SCM=scm1.org
+export COMPOSE_FILE=docker-compose.yaml:root-ca-rotation.yaml
+
+: ${OZONE_BUCKET_KEY_NAME:=key1}
+
+# shellcheck source=/dev/null
+source "$COMPOSE_DIR/../testlib.sh"
+
+start_docker_env
+
+execute_command_in_container kms hadoop key create ${OZONE_BUCKET_KEY_NAME}
+
+execute_robot_test scm1.org kinit.robot
+
+# verify root CA rotation monitor task is active on leader
+wait_for_execute_command scm1.org 30 "jps | grep
StorageContainerManagerStarter | sed 's/StorageContainerManagerStarter//' |
xargs | xargs -I {} jstack {} | grep 'RootCARotationManager-Active'"
+
+# wait and verify root CA is rotated
+wait_for_execute_command scm1.org 180 "ozone admin cert info 2"
+
+# transfer leader to scm2.org
+execute_robot_test scm1.org scmha/scm-leader-transfer.robot
+wait_for_execute_command scm1.org 30 "jps | grep
StorageContainerManagerStarter | sed 's/StorageContainerManagerStarter//' |
xargs | xargs -I {} jstack {} | grep 'RootCARotationManager-Inactive'"
+
+# verify om operations
+execute_commands_in_container scm1.org "ozone sh volume create /r-v1 && ozone
sh bucket create /r-v1/r-b1"
+
+# verify scm operations
+execute_robot_test scm1.org admincli/pipeline.robot
+
+# wait for next root CA rotation
+wait_for_execute_command scm1.org 180 "ozone admin cert info 3"
+
+# bootstrap new SCM4 and verify certificate
+docker-compose up -d scm4.org
+wait_for_port scm4.org 9894 120
+execute_robot_test scm4.org kinit.robot
+wait_for_execute_command scm4.org 120 "ozone admin scm roles | grep scm4.org"
+wait_for_execute_command scm4.org 30 "ozone admin cert list --role=scm | grep
scm4.org"
+
+# wait for next root CA rotation
+wait_for_execute_command scm4.org 180 "ozone admin cert info 4"
+
+#transfer leader to scm4.org
+export TARGET_SCM=scm4.org
+execute_robot_test scm4.org scmha/scm-leader-transfer.robot
Review Comment:
```suggestion
execute_robot_test scm4.org -v "TARGET_SCM:scm4.org"
scmha/scm-leader-transfer.robot
```
##########
hadoop-ozone/dist/src/main/compose/ozonesecure-ha/test-root-ca-rotation.sh:
##########
@@ -0,0 +1,93 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#suite:HA-secure
+
+COMPOSE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+export COMPOSE_DIR
+
+export SECURITY_ENABLED=true
+export OM_SERVICE_ID="omservice"
+export SCM=scm1.org
+export COMPOSE_FILE=docker-compose.yaml:root-ca-rotation.yaml
+
+: ${OZONE_BUCKET_KEY_NAME:=key1}
+
+# shellcheck source=/dev/null
+source "$COMPOSE_DIR/../testlib.sh"
+
+start_docker_env
+
+execute_command_in_container kms hadoop key create ${OZONE_BUCKET_KEY_NAME}
+
+execute_robot_test scm1.org kinit.robot
+
+# verify root CA rotation monitor task is active on leader
+wait_for_execute_command scm1.org 30 "jps | grep
StorageContainerManagerStarter | sed 's/StorageContainerManagerStarter//' |
xargs | xargs -I {} jstack {} | grep 'RootCARotationManager-Active'"
+
+# wait and verify root CA is rotated
+wait_for_execute_command scm1.org 180 "ozone admin cert info 2"
+
+# transfer leader to scm2.org
+execute_robot_test scm1.org scmha/scm-leader-transfer.robot
+wait_for_execute_command scm1.org 30 "jps | grep
StorageContainerManagerStarter | sed 's/StorageContainerManagerStarter//' |
xargs | xargs -I {} jstack {} | grep 'RootCARotationManager-Inactive'"
+
+# verify om operations
+execute_commands_in_container scm1.org "ozone sh volume create /r-v1 && ozone
sh bucket create /r-v1/r-b1"
+
+# verify scm operations
+execute_robot_test scm1.org admincli/pipeline.robot
+
+# wait for next root CA rotation
+wait_for_execute_command scm1.org 180 "ozone admin cert info 3"
+
+# bootstrap new SCM4 and verify certificate
+docker-compose up -d scm4.org
+wait_for_port scm4.org 9894 120
+execute_robot_test scm4.org kinit.robot
+wait_for_execute_command scm4.org 120 "ozone admin scm roles | grep scm4.org"
+wait_for_execute_command scm4.org 30 "ozone admin cert list --role=scm | grep
scm4.org"
+
+# wait for next root CA rotation
+wait_for_execute_command scm4.org 180 "ozone admin cert info 4"
+
+#transfer leader to scm4.org
+export TARGET_SCM=scm4.org
+execute_robot_test scm4.org scmha/scm-leader-transfer.robot
+
+# add new datanode4 and verify certificate
+docker-compose up -d datanode4
+wait_for_port datanode4 9856 60
+wait_for_execute_command scm4.org 60 "ozone admin datanode list | grep
datanode4"
+
+#transfer leader to scm3.org
+execute_robot_test scm3.org kinit.robot
+export TARGET_SCM=scm3.org
+execute_robot_test scm4.org scmha/scm-leader-transfer.robot
Review Comment:
```suggestion
execute_robot_test scm4.org -v "TARGET_SCM:scm3.org"
scmha/scm-leader-transfer.robot
```
##########
hadoop-ozone/dist/src/main/compose/testlib.sh:
##########
@@ -203,6 +203,7 @@ execute_robot_test(){
-v OZONE_DIR:"${OZONE_DIR}" \
-v SECURITY_ENABLED:"${SECURITY_ENABLED}" \
-v SCM:"${SCM}" \
+ -v TARGET_SCM:"${TARGET_SCM:-scm2.org}" \
Review Comment:
`TARGET_SCM` is specific to few tests, it should be passed in
`ozonesecure-ha/test-root-ca-rotation.sh`.
```suggestion
```
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/security/RootCARotationHandler.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.security;
+
+import org.apache.hadoop.hdds.scm.metadata.Replicate;
+
+import java.io.IOException;
+import java.util.concurrent.TimeoutException;
+
+/**
+ * This interface defines APIs for sub-ca rotation instructions.
+ */
+public interface RootCARotationHandler {
+
+ /**
+ * Notify SCM peers to do sub-ca rotation preparation and replicate
+ * this operation through RATIS.
+ * @param rootCertId the new root certificate serial ID
+ * @throws IOException on failure to persist configuration
+ */
+ @Replicate
+ void rotationPrepare(String rootCertId)
+ throws IOException, TimeoutException;
+
+ @Replicate
+ void rotationPrepareAck(String rootCertId, String scmCertId, String scmId)
+ throws IOException, TimeoutException;
+
+ @Replicate
+ void rotationCommit(String rootCertId)
+ throws IOException, TimeoutException;
+
+ @Replicate
+ void rotationCommitted(String rootCertId)
+ throws IOException, TimeoutException;
Review Comment:
`@Replicate` no longer requires `TimeoutException` to be declared.
##########
hadoop-ozone/dist/src/main/smoketest/scmha/scm-leader-transfer.robot:
##########
@@ -21,6 +21,7 @@ Resource ../commonlib.robot
Test Timeout 5 minutes
*** Variables ***
+${TARGET_SCM}= %{TARGET_SCM=scm2.org}
Review Comment:
Robot variable is passed via `-v` option. This can be simplified to avoid
using an environment variable, too.
```suggestion
${TARGET_SCM} scm2.org
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]