[ 
https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069727#comment-14069727
 ] 

Saisai Shao commented on SPARK-2492:
------------------------------------

Hi Tobias, 

I agree with you. Though I do not know the behavior of "autooffset.reset" in 
0.7, I think we should keep KafkaInputDStream the same as 0.8 documented 
behavior of "auto.offset.reset".

As you mentioned:

{quote}
it seems like a common requirement not to use the offset stored in Zookeeper, 
even though it's valid (for example, in order not to overload the Spark 
Streaming receiver with a huge number of items on startup).
{quote}

I think we can offer a flag for user to choose whether to abandon old data, to 
delete the ZK metadata according to the flag. But this also relies on 
"auto.offset.reset" = "largest" which is default in 0.8, if set to "smallest" 
we will read the whole data from beginning even delete the ZK metadata. 

Besides, what you mentioned in SPARK-2383 is a really issue we should take care 
of.

Thanks a lot for your advice :).




> KafkaReceiver minor changes to align with Kafka 0.8 
> ----------------------------------------------------
>
>                 Key: SPARK-2492
>                 URL: https://issues.apache.org/jira/browse/SPARK-2492
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Saisai Shao
>            Assignee: Saisai Shao
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> Update to delete Zookeeper metadata when Kafka's parameter 
> "auto.offset.reset" is set to "smallest", which is aligned with Kafka 0.8's 
> ConsoleConsumer.
> Also use Kafka offered API without directly using zkClient.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to