gemini-code-assist[bot] commented on code in PR #36346:
URL: https://github.com/apache/beam/pull/36346#discussion_r2406462547


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java:
##########
@@ -413,6 +414,40 @@ public Long create(PipelineOptions options) {
 
   void setUserAgent(String userAgent);
 
+  /**
+   * A string defining whether GroupByKey transforms should be replaced by 
GroupByEncryptedKey
+   *
+   * <p>Beam will infer the secret type and value based on the secret itself. 
This guarantees that
+   * any data at rest during the performing a GBK, so this can be used to 
guarantee that data is not
+   * unencrypted. Runners with this behavior include the Dataflow, Flink, and 
Spark runners. The
+   * option should be structured like:
+   *
+   * <pre><code>
+   * --encrypt=type:<secret_type>;<secret_param>:<value>
+   * </code></pre>
+   *
+   * for example:
+   *
+   * <pre><code>
+   * --encrypt=type:GcpSecret;version_name:my_secret/versions/latest"
+   * </code></pre>
+   *
+   * All variables should use snake case to allow consistency across languages.
+   */
+  @Description(
+      "When set, will replace all GroupByKey transforms in the pipeline the 
option. Beam will"
+          + " infer the secret type and value based on the secret itself. This 
guarantees that"
+          + " any data at rest during the performing a GBK, so this can be 
used to guarantee"
+          + " that data is not unencrypted. Runners with this behavior include 
the Dataflow,"
+          + " Flink, and Spark runners. The option should be structured like:"
+          + " --encrypt=type:<secret_type>;<secret_param>:<value>, for example 
"
+          + " --encrypt=type:GcpSecret;version_name:my_secret/versions/latest. 
All variables "
+          + " should use snake case to allow consistency across languages.")

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The documentation for the new `GBEK` option has a few small issues that 
could confuse users:
   
   *   The Javadoc and `@Description` refer to the command-line flag as 
`--encrypt`, but based on the method name `getGBEK`, the flag will be `--GBEK` 
(or `--gbek`). This should be made consistent.
   *   There's a typo in the Javadoc example: `.../versions/latest"` has an 
extra quote at the end.
   *   The `@Description` has a small grammatical error: "...in the pipeline 
the option. Beam will...".
   
   I've suggested a revision to address these points and improve clarity.
   
   ```java
      * A string defining whether GroupByKey transforms should be replaced by 
GroupByEncryptedKey.
      *
      * <p>When this option is set, Beam will replace all {@link 
org.apache.beam.sdk.transforms.GroupByKey}
      * transforms with {@link 
org.apache.beam.sdk.transforms.GroupByEncryptedKey}. This guarantees
      * that any data at rest while performing a GBK is encrypted. This 
behavior is supported by
      * runners like Dataflow, Flink, and Spark.
      *
      * <p>The option should be provided on the command line as e.g.
      * {@code --GBEK="type:<secret_type>;<secret_param>:<value>"}.
      *
      * <p>For example, to use a secret from GCP Secret Manager:
      *
      * <pre><code>
      * --GBEK="type:GcpSecret;version_name:my_secret/versions/latest"
      * </code></pre>
      *
      * <p>All parameter names should use snake_case for consistency across 
languages.
      */
     @Description(
         "When set, replaces all GroupByKey transforms in the pipeline with 
GroupByEncryptedKey, "
             + "using the provided secret. This guarantees that any data at 
rest during a GBK "
             + "operation is encrypted. Supported by Dataflow, Flink, and Spark 
runners. "
             + "The option should be structured as: "
             + "'type:<secret_type>;<secret_param>:<value>'. For example: "
             + "'type:GcpSecret;version_name:my_secret/versions/latest'. All 
parameter names "
             + "should use snake_case for consistency across languages.")
   ```



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/util/Secret.java:
##########
@@ -33,4 +38,48 @@ public interface Secret extends Serializable {
    * @return The secret as a byte array.
    */
   byte[] getSecretBytes();
+
+  static Secret parseSecretOption(String secretOption) {
+    Map<String, String> paramMap = new HashMap<>();
+    for (String param : secretOption.split(";", -1)) {
+      String[] parts = param.split(":", 2);
+      if (parts.length == 2) {
+        paramMap.put(parts[0], parts[1]);
+      }
+    }
+
+    if (!paramMap.containsKey("type")) {
+      throw new RuntimeException("Secret string must contain a valid type 
parameter");
+    }
+
+    String secretType = paramMap.get("type");
+    paramMap.remove("type");
+
+    if (secretType == null) {
+      throw new RuntimeException("Secret string must contain a valid value for 
type parameter");
+    }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   This null check for `secretType` appears to be redundant. The `paramMap` is 
populated from `secretOption.split(";", -1)` and `param.split(":", 2)`. If 
`secretOption` does not contain `type:`, `paramMap.containsKey("type")` will be 
false, and an exception will be thrown on line 52. If it does contain `type:`, 
`paramMap.get("type")` will return a non-null (though possibly empty) string. 
Therefore, `secretType` can never be `null` at this point.



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByEncryptedKey.java:
##########
@@ -67,7 +77,26 @@ private GroupByEncryptedKey(Secret hmacKey) {
    * @return A {@link GroupByEncryptedKey} transform.
    */
   public static <K, V> GroupByEncryptedKey<K, V> create(Secret hmacKey) {
-    return new GroupByEncryptedKey<>(hmacKey);
+    GroupByKey<byte[], KV<byte[], byte[]>> gbk = GroupByKey.create();

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `gbk` variable is created but not used, as a new `GroupByKey` instance 
is created in the return statement on the next line. You can remove this line 
to improve code clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to