[
https://issues.apache.org/jira/browse/SPARK-27549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-27549:
------------------------------------
Assignee: (was: Apache Spark)
> Commit Kafka Source offsets to facilitate external tooling
> ----------------------------------------------------------
>
> Key: SPARK-27549
> URL: https://issues.apache.org/jira/browse/SPARK-27549
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.0.0
> Reporter: Stavros Kontopoulos
> Priority: Major
>
> Tools monitoring consumer lag could benefit from having the option of saving
> the source offsets. Sources use the implementation of
> org.apache.spark.sql.sources.v2.reader.streaming.
> SparkDataStream. KafkaMicroBatchStream currently [does not
> commit|https://github.com/apache/spark/blob/5bf5d9d854db53541956dedb03e2de8eecf65b81/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L170]
> anything as expected so we could expand that.
> Other streaming engines like
> [Flink|https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-offset-committing-behaviour-configuration]
> allow you to enable `auto.commit` at the expense of not having checkpointing.
> Here the proposal is to allow commit the sources offsets when progress has
> been made.
> I am also aware that another option would be to have a StreamingQueryListener
> and intercept when batches are completed and then write the offsets anywhere
> you need to but it would be great if Kafka integration with Structured
> Streaming could do some of this work anyway.
> [[email protected]] [~marmbrus] what do you think?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]