Dan Dutrow created SPARK-9947:
---------------------------------
Summary: Separate Metadata and State Checkpoint Data
Key: SPARK-9947
URL: https://issues.apache.org/jira/browse/SPARK-9947
Project: Spark
Issue Type: Improvement
Components: Streaming
Reporter: Dan Dutrow
This is the proposal.
The simpler direct API (the one that does not take explicit offsets) can be
modified to also pick up the initial offset from ZK if group.id is specified.
This is exactly similar to how we find the latest or earliest offset in that
API, just that instead of latest/earliest offset of the topic we want to find
the offset from the consumer group. The group offsets is ZK is not used at all
for any further processing and restarting, so the exactly-once semantics is not
broken.
The use case where this is useful is simplified code upgrade. If the user wants
to upgrade the code, he/she can the context stop gracefully which will ensure
the ZK consumer group offset will be updated with the last offsets processed.
Then the new code is started (not restarted from checkpoint) can pickup the
consumer group offset from ZK and continue where the previous code had left
off.
Without the functionality of picking up consumer group offsets to start (that
is, currently) the only way to do this is for the users to save the offsets
somewhere (file, database, etc.) and manage the offsets themselves. I just want
to simplify this process.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]