Doubt on KafkaIO-SDF / KafkaLatestOffsetEstimator

Marco Robles Wed, 22 Sep 2021 09:47:55 -0700

Hi folks,

Taking a look on Kafka SDF implementation I saw there is a
KafkaLatestOffsetEstimator
<https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java#L202>
which calculates or estimate the latest offset position available in
TopicPartition, under this implementation there is the usage of
*Suppliers.memoizeWithExpiration()*, and taking a look to javadoc
<https://javadoc.io/static/org.apache.beam/beam-sdks-java-core/2.29.0/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.RangeEndEstimator.html>
says if estimate() is too expensive to compute, you could use
memoizeWithExpiration(), my question is why checking the latest offset
position is expensive in Kafka use case? I am assuming that it is expensive
since you constantly need to check the latest offset position for the range
estimator, am I right?


-- 

*Marco Robles* *|* WIZELINE

Software Engineer

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

Doubt on KafkaIO-SDF / KafkaLatestOffsetEstimator

Reply via email to