Re: Easy way to get offset metatada with Spark Streaming API

Cody Koeninger Mon, 11 Sep 2017 11:01:49 -0700

https://issues-test.apache.org/jira/browse/SPARK-18258


On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko <[email protected]> wrote:
> Hi all,
>
> It started as a discussion in
> https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api.
>
> So the problem that there is no support in Public API to obtain the Kafka
> (or Kineses) offsets. For example, if you want to save offsets in external
> storage in Custom Sink, you should :
> 1) preserve topic, partition and offset across all transform operations of
> Dataset (based on hard-coded Kafka schema)
> 2) make a manual group by partition/offset with aggregate max offset
>
> Structured Streaming doc says "Every streaming source is assumed to have
> offsets", so why it's not a part of Public API? What do you think about
> supporting it?
>
> Dmitry

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Easy way to get offset metatada with Spark Streaming API

Reply via email to