sjvanrossum commented on code in PR #34201:
URL: https://github.com/apache/beam/pull/34201#discussion_r2095512482


##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java:
##########
@@ -173,19 +172,6 @@ public boolean advance() throws IOException {
         elementsReadBySplit.inc();
 
         ConsumerRecord<byte[], byte[]> rawRecord = pState.recordIter.next();
-        long expected = pState.nextOffset;
-        long offset = rawRecord.offset();
-
-        if (offset < expected) { // -- (a)
-          // this can happen when compression is enabled in Kafka (seems to be 
fixed in 0.10)

Review Comment:
   If I recall correctly, the Beam module Gradle plugin forces dependency 
versions, but the dependency scope for Kafka clients is set to `provided` so it 
doesn't apply for the `beam-sdks-java-io-kafka` artifact and the project build 
file tests against these versions of the Kafka client library: 
   ```
   def kafkaVersions = [
       '201': "2.0.1",
       '231': "2.3.1",
       '241': "2.4.1",
       '251': "2.5.1",
       '282': "2.8.2",
       '312': "3.1.2",
       '390': "3.9.0",
   ]
   ```
   
   I don't think that's how we should test backwards compatibility since 
there's built-in support for version ranges in both POM and Gradle metadata. In 
any case, we're miles ahead of the version mentioned in this comment and I've 
gone back to check on 0.11.0+ and found that the component responsible for 
tracking this (from the fetch collector to the poll loop) hasn't changed much 
during that time. I cross-referenced a few other Kafka source integrations 
(Flink, Spark, Venice) and only Spark (pinned to client version 0.10.x) still 
has this type of safeguard in place 
([source](https://github.com/apache/spark/blob/v3.5.5/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala#L131)).
   
   We can also deprecate all those older client library versions at some point 
since every release <4.0.0 still supports communication with brokers >=0.8.0. 
Client library versions >=4.0.0 bump that up to broker versions >=2.1.0 so if 
there's reason to build against multiple client library versions it should 
simply be 3.x.y and 4.x.y going forward. :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to