bvolpato commented on code in PR #27702:
URL: https://github.com/apache/beam/pull/27702#discussion_r1276717799
##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##########
@@ -311,17 +324,26 @@ public OffsetRangeTracker restrictionTracker(
if (restriction.getTo() < Long.MAX_VALUE) {
return new OffsetRangeTracker(restriction);
}
- Map<String, Object> updatedConsumerConfig =
- overrideBootstrapServersConfig(consumerConfig, kafkaSourceDescriptor);
- KafkaLatestOffsetEstimator offsetPoller =
- new KafkaLatestOffsetEstimator(
- consumerFactoryFn.apply(
- KafkaIOUtils.getOffsetConsumerConfig(
- "tracker-" + kafkaSourceDescriptor.getTopicPartition(),
- offsetConsumerConfig,
- updatedConsumerConfig)),
- kafkaSourceDescriptor.getTopicPartition());
- return new GrowableOffsetRangeTracker(restriction.getFrom(), offsetPoller);
+ final Map<TopicPartition, KafkaLatestOffsetEstimator>
offsetEstimatorCacheInstance =
+ Preconditions.checkStateNotNull(this.offsetEstimatorCache);
+
+ TopicPartition topicPartition = kafkaSourceDescriptor.getTopicPartition();
+ KafkaLatestOffsetEstimator offsetEstimator =
offsetEstimatorCacheInstance.get(topicPartition);
Review Comment:
I agree -- but TopicPartition comes as an element in a PCollection, so
changing the scope is more complicated and requires more deep thinking and
testing.
There are some more fundamental changes that need to be done to make this
happen -- but given the trouble that this can cause for a few clusters, this
feels like a good patch to get in while we think about how to make SDF causing
less overhead (I still see a bunch of short-lived consumers/connections for
process continuation, which didn't happen in legacy).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]