Hi all,

It started as a discussion in
https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api
.

So the problem that there is no support in Public API to obtain the Kafka
(or Kineses) offsets. For example, if you want to save offsets in external
storage in Custom Sink, you should :
1) preserve topic, partition and offset across all transform operations of
Dataset (based on hard-coded Kafka schema)
2) make a manual group by partition/offset with aggregate max offset

Structured Streaming doc says "Every streaming source is assumed to have
offsets", so why it's not a part of Public API? What do you think about
supporting it?

Dmitry

Reply via email to