[
https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064721#comment-14064721
]
Saisai Shao commented on SPARK-2492:
------------------------------------
Hi TD,
Also I did some experiments on the previous code. In previous code, zookeeper
group metadata will be cleaned if auto.offset.reset is set, no matter it is
smallest or largest, this will lead to two results:
1. smallest: we will always read data from the beginning of partition no matter
the groupid is new or old.
2. largest: we will always read data from the end of partition no matter the
groupid is new or old.
I think the reason is that we delete the group metadata in zookeeper, so Kafka
can only relies on auto.offset.reset to position the offset.
If we do not remove zookeeper metadata, the result will turn to:
1. smallest: we will read from the beginning of the partition for new groupid,
and for old groupid, the start point is the last commit offset.
2. largest: we will read from the end of the partition for new groupid, and for
old groupid, the start point is the last commit offset.
So I think in the previous code, "auto.offset.reset" is not a hint for
out-range seeking, it is a immediate enforcement for offset to seek to the
beginning or end of the partition, I'm not sure what's the purpose of previous
design ?
I think directly seeking to the beginning or end of the partition when
"auto.offset.reset" is set may has the different purpose of Kafka's own
behavior, and will lead to unwanted result when people set this parameter
(because of different from Kafka's predefined meaning). So I'd prefer to remove
this code path.
What's your thought and concern ?
> KafkaReceiver minor changes to align with Kafka 0.8
> ----------------------------------------------------
>
> Key: SPARK-2492
> URL: https://issues.apache.org/jira/browse/SPARK-2492
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.0.0
> Reporter: Saisai Shao
> Assignee: Saisai Shao
> Priority: Minor
> Fix For: 1.1.0
>
>
> Update to delete Zookeeper metadata when Kafka's parameter
> "auto.offset.reset" is set to "smallest", which is aligned with Kafka 0.8's
> ConsoleConsumer.
> Also use Kafka offered API without directly using zkClient.
--
This message was sent by Atlassian JIRA
(v6.2#6252)