[
https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064636#comment-14064636
]
Saisai Shao commented on SPARK-2492:
------------------------------------
Hi TD,
I revisit the Kafka's ConsoleConsumer carefully, I start to doubt the purpose
of this tricky modification.
When "auto.offset.reset" = "small", consumer offset will seek to the beginning
of the partition *only* when *current offset is out of range*.
Delete zookeeper metadata will *force* reading data for beginning of partition
*immediately*.
So actually only when we want to explicitly fetch data from beginning like
ConsoleConsumer which specify parameter "--from-beginning", we need not to
delete zookeeper metadata explicitly.
Also I revisit the previous PR about this part
([https://github.com/mesos/spark/pull/527]) carefully, seems I redo the same
thing as this
[commit|https://github.com/Reinvigorate/spark/commit/cfa8e769a86664722f47182fa572179e8beadcb7].
I'm not sure what's original purpose of deleting zookeeper metadata not matter
"auto.offset.reset" is "smallset" or "largest"? After rethinking about this
part, I think this tricky hack is needless, only when we need to read data from
beginning immediately, we need to delete this metadata. So I'm not sure if you
know the original purpose.
Sorry for immature thought and PR, if you think it's no need to modify I will
close this PR.
> KafkaReceiver minor changes to align with Kafka 0.8
> ----------------------------------------------------
>
> Key: SPARK-2492
> URL: https://issues.apache.org/jira/browse/SPARK-2492
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.0.0
> Reporter: Saisai Shao
> Assignee: Saisai Shao
> Priority: Minor
> Fix For: 1.1.0
>
>
> Update to delete Zookeeper metadata when Kafka's parameter
> "auto.offset.reset" is set to "smallest", which is aligned with Kafka 0.8's
> ConsoleConsumer.
> Also use Kafka offered API without directly using zkClient.
--
This message was sent by Atlassian JIRA
(v6.2#6252)