gemini-code-assist[bot] commented on code in PR #35812: URL: https://github.com/apache/beam/pull/35812#discussion_r2260830323
########## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java: ########## @@ -1098,14 +1098,39 @@ public Read<K, V> withRedistribute() { return toBuilder().setRedistributed(true).build(); } + /** + * <p>Hints to the runner that it can relax exactly-once processing guarantees, allowing + * duplicates in at-least-once processing mode of Kafka inputs. + * + * <p>Must be used with {@link KafkaIO#withRedistribute()}. + * + * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}. Review Comment:  The Javadoc for this method can be improved for clarity and to prevent runtime errors. * The current description of the dependency on `withRedistribute()` says "Must be used with...". However, the implementation only logs a warning if `withRedistribute()` is not used. The Javadoc should reflect this behavior to avoid confusion. * The method accepts a `Boolean` wrapper type, but passing `null` will cause a `NullPointerException` because the underlying builder method expects a primitive `boolean`. This is a significant correctness issue and should be documented to prevent misuse. * Adding `@param` and `@return` tags would improve documentation clarity, which is a common practice for public APIs in this file (e.g., `withCreateTime`). Here is a suggested improvement: ```suggestion * <p>Hints to the runner that it can relax exactly-once processing guarantees, allowing * duplicates in at-least-once processing mode for Kafka inputs. * * <p>This setting is only effective when used with {@link KafkaIO#withRedistribute()}. * * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}. * * @param allowDuplicates specifies whether to allow duplicates. Passing {@code null} will cause a * {@link NullPointerException}. * @return an updated {@link Read} transform. ``` ########## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java: ########## @@ -1098,14 +1098,39 @@ return toBuilder().setRedistributed(true).build(); } + /** + * <p>Hints to the runner that it can relax exactly-once processing guarantees, allowing + * duplicates in at-least-once processing mode of Kafka inputs. + * + * <p>Must be used with {@link KafkaIO#withRedistribute()}. + * + * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}. + */ public Read<K, V> withAllowDuplicates(Boolean allowDuplicates) { return toBuilder().setAllowDuplicates(allowDuplicates).build(); } + /** + * <p>Redistributes Kafka messages into a distinct number of keys for processing in subsequent + * steps. + * + * <p>Specifying an explicit number of keys is generally receommended over redistributing Review Comment:  There is a typo in "receommended". It should be "recommended". ```suggestion * <p>Specifying an explicit number of keys is generally recommended over redistributing ``` ########## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java: ########## @@ -1098,14 +1098,39 @@ return toBuilder().setRedistributed(true).build(); } + /** + * <p>Hints to the runner that it can relax exactly-once processing guarantees, allowing + * duplicates in at-least-once processing mode of Kafka inputs. + * + * <p>Must be used with {@link KafkaIO#withRedistribute()}. + * + * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}. + */ public Read<K, V> withAllowDuplicates(Boolean allowDuplicates) { return toBuilder().setAllowDuplicates(allowDuplicates).build(); } + /** + * <p>Redistributes Kafka messages into a distinct number of keys for processing in subsequent + * steps. + * + * <p>Specifying an explicit number of keys is generally receommended over redistributing + * into an unbounded key space. + * + * <p>Must be used with {@link KafkaIO#withRedistribute()}. + */ public Read<K, V> withRedistributeNumKeys(int redistributeNumKeys) { return toBuilder().setRedistributeNumKeys(redistributeNumKeys).build(); } + /** + * <p>Hints to the runner to optimize the redistribute by minimizing the amount of data required + * for persistence as part of the redistribute operation. + * + * <p>Must be used with {@link KafkaIO#withRedistribute()}. + * + * <p>Not compatible with {@link KafkaIO#withAllowDuplicates()}. + */ Review Comment:  The current Javadoc is a bit vague and focuses on the implementation detail ("minimizing the amount of data required for persistence") rather than the user-facing behavior. It would be more helpful to explain what "offset deduplication" means and what its effects are. Also, the method accepts a `Boolean` and `null` is treated as `false`. This behavior should be documented in a `@param` tag for clarity. Here is a suggested improvement: ```suggestion * <p>Hints to the runner to perform offset-based deduplication during redistribution. This can * be used to achieve exactly-once processing guarantees and may optimize the redistribute * operation by minimizing the amount of data required for persistence. * * <p>Must be used with {@link KafkaIO#withRedistribute()}. * * <p>Not compatible with {@link KafkaIO#withAllowDuplicates(Boolean)} set to {@code true}. * * @param offsetDeduplication specifies whether to enable offset-based deduplication. A {@code null} * value is treated as {@code false}. * @return an updated {@link Read} transform. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org