Hi all, It started as a discussion in https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api .
So the problem that there is no support in Public API to obtain the Kafka (or Kineses) offsets. For example, if you want to save offsets in external storage in Custom Sink, you should : 1) preserve topic, partition and offset across all transform operations of Dataset (based on hard-coded Kafka schema) 2) make a manual group by partition/offset with aggregate max offset Structured Streaming doc says "Every streaming source is assumed to have offsets", so why it's not a part of Public API? What do you think about supporting it? Dmitry