Re: [PR] Add javadoc comments for public methods related to KafkaIO redistribute. [beam]

via GitHub Thu, 07 Aug 2025 09:24:31 -0700


gemini-code-assist[bot] commented on code in PR #35812:
URL: https://github.com/apache/beam/pull/35812#discussion_r2260830323



##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##########
@@ -1098,14 +1098,39 @@ public Read<K, V> withRedistribute() {
       return toBuilder().setRedistributed(true).build();
     }
 
+    /**
+     * <p>Hints to the runner that it can relax exactly-once processing 
guarantees, allowing
+     * duplicates in at-least-once processing mode of Kafka inputs.
+     *
+     * <p>Must be used with {@link KafkaIO#withRedistribute()}.
+     *
+     * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}.

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The Javadoc for this method can be improved for clarity and to prevent 
runtime errors.
   
   *   The current description of the dependency on `withRedistribute()` says 
"Must be used with...". However, the implementation only logs a warning if 
`withRedistribute()` is not used. The Javadoc should reflect this behavior to 
avoid confusion.
   *   The method accepts a `Boolean` wrapper type, but passing `null` will 
cause a `NullPointerException` because the underlying builder method expects a 
primitive `boolean`. This is a significant correctness issue and should be 
documented to prevent misuse.
   *   Adding `@param` and `@return` tags would improve documentation clarity, 
which is a common practice for public APIs in this file (e.g., 
`withCreateTime`).
   
   Here is a suggested improvement:
   
   ```suggestion
        * <p>Hints to the runner that it can relax exactly-once processing 
guarantees, allowing
        * duplicates in at-least-once processing mode for Kafka inputs.
        *
        * <p>This setting is only effective when used with {@link 
KafkaIO#withRedistribute()}.
        *
        * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}.
        *
        * @param allowDuplicates specifies whether to allow duplicates. Passing 
{@code null} will cause a
        *     {@link NullPointerException}.
        * @return an updated {@link Read} transform.
   ```



##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##########
@@ -1098,14 +1098,39 @@
       return toBuilder().setRedistributed(true).build();
     }
 
+    /**
+     * <p>Hints to the runner that it can relax exactly-once processing 
guarantees, allowing
+     * duplicates in at-least-once processing mode of Kafka inputs.
+     *
+     * <p>Must be used with {@link KafkaIO#withRedistribute()}.
+     *
+     * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}.
+     */
     public Read<K, V> withAllowDuplicates(Boolean allowDuplicates) {
       return toBuilder().setAllowDuplicates(allowDuplicates).build();
     }
 
+    /**
+     * <p>Redistributes Kafka messages into a distinct number of keys for 
processing in subsequent
+     * steps.
+     *
+     * <p>Specifying an explicit number of keys is generally receommended over 
redistributing

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   There is a typo in "receommended". It should be "recommended".
   
   ```suggestion
        * <p>Specifying an explicit number of keys is generally recommended 
over redistributing
   ```



##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##########
@@ -1098,14 +1098,39 @@
       return toBuilder().setRedistributed(true).build();
     }
 
+    /**
+     * <p>Hints to the runner that it can relax exactly-once processing 
guarantees, allowing
+     * duplicates in at-least-once processing mode of Kafka inputs.
+     *
+     * <p>Must be used with {@link KafkaIO#withRedistribute()}.
+     *
+     * <p>Not compatible with {@link KafkaIO#withOffsetDeduplication()}.
+     */
     public Read<K, V> withAllowDuplicates(Boolean allowDuplicates) {
       return toBuilder().setAllowDuplicates(allowDuplicates).build();
     }
 
+    /**
+     * <p>Redistributes Kafka messages into a distinct number of keys for 
processing in subsequent
+     * steps.
+     *
+     * <p>Specifying an explicit number of keys is generally receommended over 
redistributing
+     * into an unbounded key space.
+     *
+     * <p>Must be used with {@link KafkaIO#withRedistribute()}.
+     */
     public Read<K, V> withRedistributeNumKeys(int redistributeNumKeys) {
       return toBuilder().setRedistributeNumKeys(redistributeNumKeys).build();
     }
 
+    /**
+     * <p>Hints to the runner to optimize the redistribute by minimizing the 
amount of data required
+     * for persistence as part of the redistribute operation.
+     *
+     * <p>Must be used with {@link KafkaIO#withRedistribute()}.
+     *
+     * <p>Not compatible with {@link KafkaIO#withAllowDuplicates()}.
+     */

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The current Javadoc is a bit vague and focuses on the implementation detail 
("minimizing the amount of data required for persistence") rather than the 
user-facing behavior. It would be more helpful to explain what "offset 
deduplication" means and what its effects are.
   
   Also, the method accepts a `Boolean` and `null` is treated as `false`. This 
behavior should be documented in a `@param` tag for clarity.
   
   Here is a suggested improvement:
   
   ```suggestion
        * <p>Hints to the runner to perform offset-based deduplication during 
redistribution. This can
        * be used to achieve exactly-once processing guarantees and may 
optimize the redistribute
        * operation by minimizing the amount of data required for persistence.
        *
        * <p>Must be used with {@link KafkaIO#withRedistribute()}.
        *
        * <p>Not compatible with {@link KafkaIO#withAllowDuplicates(Boolean)} 
set to {@code true}.
        *
        * @param offsetDeduplication specifies whether to enable offset-based 
deduplication. A {@code null}
        *     value is treated as {@code false}.
        * @return an updated {@link Read} transform.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add javadoc comments for public methods related to KafkaIO redistribute. [beam]

Reply via email to