[
https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069727#comment-14069727
]
Saisai Shao commented on SPARK-2492:
------------------------------------
Hi Tobias,
I agree with you. Though I do not know the behavior of "autooffset.reset" in
0.7, I think we should keep KafkaInputDStream the same as 0.8 documented
behavior of "auto.offset.reset".
As you mentioned:
{quote}
it seems like a common requirement not to use the offset stored in Zookeeper,
even though it's valid (for example, in order not to overload the Spark
Streaming receiver with a huge number of items on startup).
{quote}
I think we can offer a flag for user to choose whether to abandon old data, to
delete the ZK metadata according to the flag. But this also relies on
"auto.offset.reset" = "largest" which is default in 0.8, if set to "smallest"
we will read the whole data from beginning even delete the ZK metadata.
Besides, what you mentioned in SPARK-2383 is a really issue we should take care
of.
Thanks a lot for your advice :).
> KafkaReceiver minor changes to align with Kafka 0.8
> ----------------------------------------------------
>
> Key: SPARK-2492
> URL: https://issues.apache.org/jira/browse/SPARK-2492
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.0.0
> Reporter: Saisai Shao
> Assignee: Saisai Shao
> Priority: Minor
> Fix For: 1.1.0
>
>
> Update to delete Zookeeper metadata when Kafka's parameter
> "auto.offset.reset" is set to "smallest", which is aligned with Kafka 0.8's
> ConsoleConsumer.
> Also use Kafka offered API without directly using zkClient.
--
This message was sent by Atlassian JIRA
(v6.2#6252)