[ 
https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064721#comment-14064721
 ] 

Saisai Shao commented on SPARK-2492:
------------------------------------

Hi TD, 

Also I did some experiments on the previous code. In previous code, zookeeper 
group metadata will be cleaned if auto.offset.reset is set, no matter it is 
smallest or largest, this will lead to two results:

1. smallest: we will always read data from the beginning of partition no matter 
the groupid is new or old.
2. largest: we will always read data from the end of partition no matter the 
groupid is new or old.

I think the reason is that we delete the group metadata in zookeeper, so Kafka 
can only relies on auto.offset.reset to position the offset.

If we do not remove zookeeper metadata, the result will turn to:

1. smallest: we will read from the beginning of the partition for new groupid, 
and for old groupid, the start point is the last commit offset.
2. largest: we will read from the end of the partition for new groupid, and for 
old groupid, the start point is the last commit offset.

So I think in the previous code, "auto.offset.reset" is not a hint for 
out-range seeking, it is a immediate enforcement for offset to seek to the 
beginning or end of the partition, I'm not sure what's the purpose of previous 
design ?

I think directly seeking to the beginning or end of the partition when 
"auto.offset.reset" is set may has the different purpose of Kafka's own 
behavior, and will lead to unwanted result when people set this parameter 
(because of different from Kafka's predefined meaning). So I'd prefer to remove 
this code path.

What's your thought and concern ?




> KafkaReceiver minor changes to align with Kafka 0.8 
> ----------------------------------------------------
>
>                 Key: SPARK-2492
>                 URL: https://issues.apache.org/jira/browse/SPARK-2492
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Saisai Shao
>            Assignee: Saisai Shao
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> Update to delete Zookeeper metadata when Kafka's parameter 
> "auto.offset.reset" is set to "smallest", which is aligned with Kafka 0.8's 
> ConsoleConsumer.
> Also use Kafka offered API without directly using zkClient.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to