Yes!

> On Jun 28, 2017, at 11:00 PM, Zhechao Ma <mazhechaomaill...@gmail.com> wrote:
> 
> @Hugo
> 
> So, if has committed to Kafka, UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST
> are the same. They are just diffrernt if has never committed to Kafka.
> 
> Am I right?
> 
> 2017-06-29 1:06 GMT+08:00 Hugo Da Cruz Louro <hlo...@hortonworks.com>:
> 
>> Your KafkaConsumer instance (which boils down to your KafkaSpout) can be
>> in one of two states:
>> 
>> 1 - Has committed to Kafka
>> Here, EARLIEST fetches from the first offset. LATEST fetches from the last
>> offset. UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST will fetch from the
>> last committed offset - the  _earliest and _latest are “kinda” disregarded.
>> 
>> 2 - Has never committed to Kafka
>> EARLIEST is the same as UNCOMMITTED_EARLIEST
>> LATEST is the same as UNCOMMITTED_LATEST
>> 
>> What this means is that since this KafkaSpout (Kafka Consumer Instance)
>> has never committed to Kafka, when you do the first poll, from which offset
>> do you start. We give the option to start from the beginning (EARLIEST) or
>> from the end (LATEST).
>> 
>> Best,
>> Hugo
>> 
>> 
>>> On Jun 28, 2017, at 9:47 AM, Stig Døssing <generalbas....@gmail.com>
>> wrote:
>>> 
>>> No, the description is accurate.
>>> 
>>> EARLIEST and LATEST are for unconditionally starting at the beginning or
>>> end of the subscribed partitions. So if you configure a spout to use
>> either
>>> of these, it will start at the earliest or latest offset on each
>> partition
>>> every time you start it. Example: Say the spout is set to EARLIEST and
>>> we've just deployed it for the first time. The spout seeks to the
>> earliest
>>> offset (let's say 0) and emits offsets 0-100, and commits them (marking
>>> them as "done" in Kafka for the spout's consumer group, this happens when
>>> you ack the tuples emitted by the spout). The spout then crashes for some
>>> reason, or you redeploy the topology. The spout will pick up at offset 0
>>> when it restarts, because it is configured to always start at the
>> beginning
>>> of the partition.
>>> 
>>> UNCOMMITTED_EARLIEST and UNCOMMITTED_LATEST act exactly like the other
>> two
>>> modes if the spout hasn't committed anything. If the spout has committed
>>> some offsets, the restarted spout will pick up where it left off, at the
>>> last committed offset. Example: Say the spout is set to
>>> UNCOMMITTED_EARLIEST and we've just deployed it for the first time. The
>>> spout seeks to the earliest offset because it hasn't previously committed
>>> anything, so it starts at offset 0 and emits offsets 0-100 and commits
>> them
>>> once the tuples are acked. The spout crashes. The restarted spout will
>> pick
>>> up at offset 100, because that was the last committed offset before it
>>> crashed.
>>> 
>>> I hope this helps.
>>> 
>>> 2017-06-28 8:40 GMT+02:00 Zhechao Ma <mazhechaomaill...@gmail.com>:
>>> 
>>>> The storm-kafka-client document explains these two values just almost
>> the
>>>> same except the last word.
>>>> 
>>>> https://github.com/apache/storm/blob/master/docs/storm-kafka-client.md
>>>> 
>>>>  - UNCOMMITTED_EARLIEST (DEFAULT) means that the kafka spout polls
>>>>  records from the last committed offset, if any. If no offset has been
>>>>  committed, it behaves as EARLIEST.
>>>>  - UNCOMMITTED_LATEST means that the kafka spout polls records from the
>>>>  last committed offset, if any. If no offset has been committed, it
>>>> behaves
>>>>  as LATEST.
>>>> 
>>>> Or is that a mistake?
>>>> 
>>>> --
>>>> Thanks
>>>> Zhechao Ma
>>>> 
>> 
>> 
> 
> 
> -- 
> Thanks
> Zhechao Ma

Reply via email to