Stavros Kontopoulos created SPARK-27549:
-------------------------------------------

             Summary: Commit Kafka Source offsets to facilitate external tooling
                 Key: SPARK-27549
                 URL: https://issues.apache.org/jira/browse/SPARK-27549
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.0.0
            Reporter: Stavros Kontopoulos


Tools monitoring consumer lag could benefit from having the option of saving 
the source offsets. Sources use the implementation of 
org.apache.spark.sql.sources.v2.reader.streaming.

SparkDataStream. KafkaMicroBatchStream currently [does not 
commit|https://github.com/apache/spark/blob/5bf5d9d854db53541956dedb03e2de8eecf65b81/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L170]
 anything as expected so we could expand that.

Other streaming engines like 
[Flink|https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-offset-committing-behaviour-configuration]
 allow you to enable `auto.commit` at the expense of not having checkpointing.

Here the proposal is to allow commit the sources offsets when progress has been 
made.

I am also aware that another option would be to have a StreamingQueryListener 
and intercept when batches are completed and then write the offsets anywhere 
you need to but it would be great if Kafka integration with Structured 
Streaming could do some of this work anyway.

[~c...@koeninger.org]  [~marmbrus] what do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to