[
https://issues.apache.org/jira/browse/SPARK-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695939#comment-14695939
]
Cody Koeninger commented on SPARK-6249:
---------------------------------------
Regarding streaming stats, those should be available in the current release
> Get Kafka offsets from consumer group in ZK when using direct stream
> --------------------------------------------------------------------
>
> Key: SPARK-6249
> URL: https://issues.apache.org/jira/browse/SPARK-6249
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Tathagata Das
>
> This is the proposal.
> The simpler direct API (the one that does not take explicit offsets) can be
> modified to also pick up the initial offset from ZK if group.id is specified.
> This is exactly similar to how we find the latest or earliest offset in that
> API, just that instead of latest/earliest offset of the topic we want to find
> the offset from the consumer group. The group offsets is ZK is not used at
> all for any further processing and restarting, so the exactly-once semantics
> is not broken.
> The use case where this is useful is simplified code upgrade. If the user
> wants to upgrade the code, he/she can the context stop gracefully which will
> ensure the ZK consumer group offset will be updated with the last offsets
> processed. Then the new code is started (not restarted from checkpoint) can
> pickup the consumer group offset from ZK and continue where the previous
> code had left off.
> Without the functionality of picking up consumer group offsets to start (that
> is, currently) the only way to do this is for the users to save the offsets
> somewhere (file, database, etc.) and manage the offsets themselves. I just
> want to simplify this process.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]