Matthias,

For all the upgrade paths, is it possible to get rid of the 2nd rolling bounce?

For the in-place upgrade, it seems like primary difference between the 1st 
rolling bounce and the 2nd rolling bounce is to decide whether to send 
Subscription Version 2 or Subscription Version 3.  (Actually, there is another 
difference mentioned in that the KIP says that the 2nd rolling bounce should 
happen after all new state stores are created by the background thread. 
However, within the 2nd rolling bounce, we say that there is still a background 
thread, so it seems like is no actual requirement to wait for the new state 
stores to be created.)

The 2nd rolling bounce already knows how to deal with mixed-mode (having both 
Version 2 and Version 3 in the same consumer group). It seems like we could get 
rid of the 2nd bounce if we added logic (somehow/somewhere) such that:
* Instances send Subscription Version 2 until all instances are running the new 
code.
* Once all the instances are running the new code, then one at a time, the 
instances start sending Subscription V3. Leader still hands out Assignment 
Version 2, until all new state stores are ready.
* Once all instances report that new stores are ready, Leader sends out 
Assignment Version 3.
* Once an instance receives an Assignment Version 3, it can delete the old 
state store.

Doing it that way seems like it would reduce a lot of operator/deployment 
overhead. No need to do 2 rolling restarts. No need to monitor logs for state 
store rebuild. You just deploy it, and the instances update themselves.

What do you think?

The thing that made me think of this is that the "2 rolling bounces" is similar 
to what Kafka brokers have to do changes in inter.broker.protocol.version and 
log.message.format.version. And in the broker case, it seems like it would be 
possible (with some work of course) to modify kafka to allow us to do similar 
auto-detection of broker capabilities and automatically do a switchover from 
old/new versions. 

-James


> On Mar 9, 2018, at 10:38 AM, Bill Bejeck <bbej...@gmail.com> wrote:
> 
> Matthias,
> 
> Thanks for the KIP, it's a +1 from me.
> 
> I do have one question regarding the retrieval methods on the new
> interfaces.
> 
> Would want to consider adding one method with a Predicate that would allow
> for filtering records by the timestamp stored with the record?  Or is this
> better left for users to implement themselves once the data has been
> retrieved?
> 
> Thanks,
> Bill
> 
> On Thu, Mar 8, 2018 at 7:14 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
>> Matthias:
>> For my point #1, I don't have preference as to which separator is chosen.
>> Given the background you mentioned, current choice is good.
>> 
>> For #2, I think my proposal is better since it is closer to English
>> grammar.
>> 
>> Would be good to listen to what other people think.
>> 
>> On Thu, Mar 8, 2018 at 4:02 PM, Matthias J. Sax <matth...@confluent.io>
>> wrote:
>> 
>>> Thanks for the comments!
>>> 
>>> @Guozhang:
>>> 
>>> So far, there is one PR for the rebalance metadata upgrade fix
>>> (addressing the mentioned
>>> https://issues.apache.org/jira/browse/KAFKA-6054) It give a first
>>> impression how the metadata upgrade works including a system test:
>>> https://github.com/apache/kafka/pull/4636
>>> 
>>> I can share other PRs as soon as they are ready. I agree that the KIP is
>>> complex am I ok with putting out more code to give better discussion
>>> context.
>>> 
>>> @Ted:
>>> 
>>> I picked `_` instead of `-` to align with the `processing.guarantee`
>>> parameter that accepts `at_least_one` and `exactly_once` as values.
>>> Personally, I don't care about underscore vs dash but I prefer
>>> consistency. If you feel strong about it, we can also change it to `-`.
>>> 
>>> About the interface name: I am fine either way -- I stripped the `With`
>>> to keep the name a little shorter. Would be good to get feedback from
>>> others and pick the name the majority prefers.
>>> 
>>> @John:
>>> 
>>> We can certainly change it. I agree that it would not make a difference.
>>> I'll dig into the code to see if any of the two version might introduce
>>> undesired complexity and update the KIP if I don't hit an issue with
>>> putting the `-v2` to the store directory instead of `rocksdb-v2`
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>> On 3/8/18 2:44 PM, John Roesler wrote:
>>>> Hey Matthias,
>>>> 
>>>> The KIP looks good to me. I had several questions queued up, but they
>>> were
>>>> all in the "rejected alternatives" section... oh, well.
>>>> 
>>>> One very minor thought re changing the state directory from
>>> "/<state.dir>/<
>>>> application.id>/<task.id>/rocksdb/storeName/" to "/<state.dir>/<
>>>> application.id>/<task.id>/rocksdb-v2/storeName/": if you put the "v2"
>>>> marker on the storeName part of the path (i.e., "/<state.dir>/<
>>>> application.id>/<task.id>/rocksdb/storeName-v2/"), then you get the
>> same
>>>> benefits without altering the high-level directory structure.
>>>> 
>>>> It may not matter, but I could imagine people running scripts to
>> monitor
>>>> rocksdb disk usage for each task, or other such use cases.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Thu, Mar 8, 2018 at 2:02 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>> 
>>>>> Matthias:
>>>>> Nicely written KIP.
>>>>> 
>>>>> "in_place" : can this be "in-place" ? Underscore may sometimes be miss
>>>>> typed (as '-'). I think using '-' is more friendly to user.
>>>>> 
>>>>> public interface ReadOnlyKeyValueTimestampStore<K, V> {
>>>>> 
>>>>> Is ReadOnlyKeyValueStoreWithTimestamp better name for the class ?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> On Thu, Mar 8, 2018 at 1:29 PM, Guozhang Wang <wangg...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Hello Matthias, thanks for the KIP.
>>>>>> 
>>>>>> I've read through the upgrade patch section and it looks good to me,
>> if
>>>>> you
>>>>>> already have a WIP PR for it could you also share it here so that
>>> people
>>>>>> can take a look?
>>>>>> 
>>>>>> I'm +1 on the KIP itself. But large KIPs like this there are always
>>> some
>>>>>> devil hidden in the details, so I think it is better to have the
>>>>>> implementation in parallel along with the design discussion :)
>>>>>> 
>>>>>> 
>>>>>> Guozhang
>>>>>> 
>>>>>> 
>>>>>> On Wed, Mar 7, 2018 at 2:12 PM, Matthias J. Sax <
>> matth...@confluent.io
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I want to propose KIP-258 for the Streams API to allow storing
>>>>>>> timestamps in RocksDB. This feature is the basis to resolve multiple
>>>>>>> tickets (issues and feature requests).
>>>>>>> 
>>>>>>> Looking forward to your comments about this!
>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>> 258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB
>>>>>>> 
>>>>>>> 
>>>>>>> -Matthias
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> -- Guozhang
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 

Reply via email to