Naireen commented on code in PR #32344:
URL: https://github.com/apache/beam/pull/32344#discussion_r1755209437


##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##########
@@ -2654,6 +2659,13 @@ public PCollection<KafkaRecord<K, V>> 
expand(PCollection<KafkaSourceDescriptor>
         if (getRedistributeNumKeys() == 0) {
           LOG.warn("This will create a key per record, which is sub-optimal 
for most use cases.");
         }
+        // is another check here needed for with commit offsets
+        if (isCommitOffsetEnabled() || configuredKafkaCommit()) {

Review Comment:
   Your explanation for isCommitOffsetEnabled makes sense,
   
   For configuredKafkaCommit, wouldn't the checkpoint always be behind 
processed records? We read messages, and then auto commit (default is every 5 
seconds), so in between those 5 seconds, processing moves ahead of the last 
committed checkpoint? 



##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##########
@@ -1697,11 +1696,17 @@ public PCollection<KafkaRecord<K, V>> expand(PBegin 
input) {
         }
 
         if (kafkaRead.isRedistributed()) {
-          // fail here instead.
-          checkArgument(
-              kafkaRead.isCommitOffsetsInFinalizeEnabled(),
-              "commitOffsetsInFinalize() can't be enabled with 
isRedistributed");
+          if (kafkaRead.isCommitOffsetsInFinalizeEnabled()) {
+            LOG.warn(
+                "commitOffsetsInFinalize() will not capture all work processed 
if set with withRedistribute()");

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to