gemini-code-assist[bot] commented on code in PR #36346:
URL: https://github.com/apache/beam/pull/36346#discussion_r2406462547
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java:
##########
@@ -413,6 +414,40 @@ public Long create(PipelineOptions options) {
void setUserAgent(String userAgent);
+ /**
+ * A string defining whether GroupByKey transforms should be replaced by
GroupByEncryptedKey
+ *
+ * <p>Beam will infer the secret type and value based on the secret itself.
This guarantees that
+ * any data at rest during the performing a GBK, so this can be used to
guarantee that data is not
+ * unencrypted. Runners with this behavior include the Dataflow, Flink, and
Spark runners. The
+ * option should be structured like:
+ *
+ * <pre><code>
+ * --encrypt=type:<secret_type>;<secret_param>:<value>
+ * </code></pre>
+ *
+ * for example:
+ *
+ * <pre><code>
+ * --encrypt=type:GcpSecret;version_name:my_secret/versions/latest"
+ * </code></pre>
+ *
+ * All variables should use snake case to allow consistency across languages.
+ */
+ @Description(
+ "When set, will replace all GroupByKey transforms in the pipeline the
option. Beam will"
+ + " infer the secret type and value based on the secret itself. This
guarantees that"
+ + " any data at rest during the performing a GBK, so this can be
used to guarantee"
+ + " that data is not unencrypted. Runners with this behavior include
the Dataflow,"
+ + " Flink, and Spark runners. The option should be structured like:"
+ + " --encrypt=type:<secret_type>;<secret_param>:<value>, for example
"
+ + " --encrypt=type:GcpSecret;version_name:my_secret/versions/latest.
All variables "
+ + " should use snake case to allow consistency across languages.")
Review Comment:

The documentation for the new `GBEK` option has a few small issues that
could confuse users:
* The Javadoc and `@Description` refer to the command-line flag as
`--encrypt`, but based on the method name `getGBEK`, the flag will be `--GBEK`
(or `--gbek`). This should be made consistent.
* There's a typo in the Javadoc example: `.../versions/latest"` has an
extra quote at the end.
* The `@Description` has a small grammatical error: "...in the pipeline
the option. Beam will...".
I've suggested a revision to address these points and improve clarity.
```java
* A string defining whether GroupByKey transforms should be replaced by
GroupByEncryptedKey.
*
* <p>When this option is set, Beam will replace all {@link
org.apache.beam.sdk.transforms.GroupByKey}
* transforms with {@link
org.apache.beam.sdk.transforms.GroupByEncryptedKey}. This guarantees
* that any data at rest while performing a GBK is encrypted. This
behavior is supported by
* runners like Dataflow, Flink, and Spark.
*
* <p>The option should be provided on the command line as e.g.
* {@code --GBEK="type:<secret_type>;<secret_param>:<value>"}.
*
* <p>For example, to use a secret from GCP Secret Manager:
*
* <pre><code>
* --GBEK="type:GcpSecret;version_name:my_secret/versions/latest"
* </code></pre>
*
* <p>All parameter names should use snake_case for consistency across
languages.
*/
@Description(
"When set, replaces all GroupByKey transforms in the pipeline with
GroupByEncryptedKey, "
+ "using the provided secret. This guarantees that any data at
rest during a GBK "
+ "operation is encrypted. Supported by Dataflow, Flink, and Spark
runners. "
+ "The option should be structured as: "
+ "'type:<secret_type>;<secret_param>:<value>'. For example: "
+ "'type:GcpSecret;version_name:my_secret/versions/latest'. All
parameter names "
+ "should use snake_case for consistency across languages.")
```
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/util/Secret.java:
##########
@@ -33,4 +38,48 @@ public interface Secret extends Serializable {
* @return The secret as a byte array.
*/
byte[] getSecretBytes();
+
+ static Secret parseSecretOption(String secretOption) {
+ Map<String, String> paramMap = new HashMap<>();
+ for (String param : secretOption.split(";", -1)) {
+ String[] parts = param.split(":", 2);
+ if (parts.length == 2) {
+ paramMap.put(parts[0], parts[1]);
+ }
+ }
+
+ if (!paramMap.containsKey("type")) {
+ throw new RuntimeException("Secret string must contain a valid type
parameter");
+ }
+
+ String secretType = paramMap.get("type");
+ paramMap.remove("type");
+
+ if (secretType == null) {
+ throw new RuntimeException("Secret string must contain a valid value for
type parameter");
+ }
Review Comment:

This null check for `secretType` appears to be redundant. The `paramMap` is
populated from `secretOption.split(";", -1)` and `param.split(":", 2)`. If
`secretOption` does not contain `type:`, `paramMap.containsKey("type")` will be
false, and an exception will be thrown on line 52. If it does contain `type:`,
`paramMap.get("type")` will return a non-null (though possibly empty) string.
Therefore, `secretType` can never be `null` at this point.
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByEncryptedKey.java:
##########
@@ -67,7 +77,26 @@ private GroupByEncryptedKey(Secret hmacKey) {
* @return A {@link GroupByEncryptedKey} transform.
*/
public static <K, V> GroupByEncryptedKey<K, V> create(Secret hmacKey) {
- return new GroupByEncryptedKey<>(hmacKey);
+ GroupByKey<byte[], KV<byte[], byte[]>> gbk = GroupByKey.create();
Review Comment:

The `gbk` variable is created but not used, as a new `GroupByKey` instance
is created in the return statement on the next line. You can remove this line
to improve code clarity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]