Ok. lets close this KIP off then as it isn't needed at the moment. We can revive later if needed.
On Tue, 14 Feb 2017 at 04:16 Eno Thereska <eno.there...@gmail.com> wrote: > Even if users commit on every record, the expensive part will not be the > checkpointing proposed in this KIP, but the rest of the commit. > > Eno > > > > On 13 Feb 2017, at 23:46, Guozhang Wang <wangg...@gmail.com> wrote: > > > > I think I'm OK to always enable checkpointing, but I'm not sure if we > want > > to checkpoint on every commit. Since in the extreme case users can commit > > on completed processing each record. So I think it is still valuable to > > have a checkpoint internal config in this KIP, which can be ignored if > EOS > > is turned on. That being said, if most people are favoring checkpointing > on > > each commit we can try that with this as well, since it won't change any > > public APIs and we can still add this config in the future if we do > observe > > some users reporting it has huge perf impacts. > > > > > > > > Guozhang > > > > On Fri, Feb 10, 2017 at 12:20 PM, Damian Guy <damian....@gmail.com> > wrote: > > > >> I'm fine with that. Gouzhang? > >> On Fri, 10 Feb 2017 at 19:45, Matthias J. Sax <matth...@confluent.io> > >> wrote: > >> > >>> I am actually supporting Eno's view: checkpoint on every commit. > >>> > >>> @Dhwani: I understand your view and did raise the same question about > >>> performance trade-off with checkpoiting enabled/disabled etc. However, > >>> it seems that writing the checkpoint file is super cheap -- thus, there > >>> is nothing to gain performance wise by disabling it. > >>> > >>> For Streams EoS we do not need the checkpoint file -- but we should > have > >>> a switch for EoS anyway and can disable the checkpoint file for this > >>> case. And even if there is no switch and we enable EoS all the time, we > >>> can get rid of the checkpoint file overall (making the parameter > >> obsolete). > >>> > >>> IMHO, if the config parameter is not really useful, we should not have > >> it. > >>> > >>> > >>> -Matthias > >>> > >>> > >>> On 2/10/17 9:27 AM, Damian Guy wrote: > >>>> Gouzhang, Thanks for the clarification. Understood. > >>>> > >>>> Eno, you are correct if we just used commit interval then we wouldn't > >>> need > >>>> a KIP. But, then we'd have no way of turning it off. > >>>> > >>>> On Fri, 10 Feb 2017 at 17:14 Eno Thereska <eno.there...@gmail.com> > >>> wrote: > >>>> > >>>>> A quick check: the checkpoint file is not new, we're just exposing a > >>> knob > >>>>> on when to set it, right? Would turning if off still do what it does > >>> today > >>>>> (i.e., write the checkpoint at the end when the user quits?) So it's > >>> not a > >>>>> new feature as such, I was only recommending we dial up the frequency > >> by > >>>>> default. With that option arguably we don't even need a KIP. > >>>>> > >>>>> Eno > >>>>> > >>>>> > >>>>> > >>>>>> On 10 Feb 2017, at 17:02, Guozhang Wang <wangg...@gmail.com> wrote: > >>>>>> > >>>>>> Damian, > >>>>>> > >>>>>> I was thinking if it is a new failure scenarios but as Eno pointed > >> out > >>> it > >>>>>> was not. > >>>>>> > >>>>>> Another thing I was considering is if it has any impact for > >>> incorporating > >>>>>> KIP-98 to avoid duplicates: if there is a failure in the middle of a > >>>>>> transaction, then upon recovery we cannot rely on the local state > >> store > >>>>>> file even if the checkpoint file exists, since the local state store > >>> file > >>>>>> may not be at the transaction boundaries. But since Streams will > >> likely > >>>>> to > >>>>>> have EOS as an opt-in I think it is still worthwhile to add this > >>> feature, > >>>>>> just keeping in mind that when EOS is turned on it may cease to be > >>>>>> effective. > >>>>>> > >>>>>> And yes, I'd suggest we leave the config value to be possibly > >>>>> non-positive > >>>>>> to indicate not turning on this feature for the reason above: if it > >>> will > >>>>>> not be effective then we want to leave it as an option to be turned > >>> off. > >>>>>> > >>>>>> Guozhang > >>>>>> > >>>>>> > >>>>>> On Fri, Feb 10, 2017 at 8:06 AM, Eno Thereska < > >> eno.there...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> The overhead of writing to the checkpoint file should be much, much > >>>>>>> smaller than the overall overhead of doing a commit, so I think > >> tuning > >>>>> the > >>>>>>> commit time is sufficient to guide performance tradeoffs. > >>>>>>> > >>>>>>> Eno > >>>>>>> > >>>>>>>> On 10 Feb 2017, at 13:08, Dhwani Katagade < > >>>>> dhwani_katag...@persistent.co > >>>>>>> .in> wrote: > >>>>>>>> > >>>>>>>> May be for fine tuning the performance. > >>>>>>>> Say we don't need the checkpointing and would like to gain the lil > >>> bit > >>>>>>> of performance improvement by turning it off. > >>>>>>>> The trade off is between giving people control knobs vs > >> complicating > >>>>> the > >>>>>>> complete set of knobs. > >>>>>>>> > >>>>>>>> -dk > >>>>>>>> > >>>>>>>> On Friday 10 February 2017 04:05 PM, Eno Thereska wrote: > >>>>>>>>> I can't see why users would care to turn it off. > >>>>>>>>> > >>>>>>>>> Eno > >>>>>>>>>> On 10 Feb 2017, at 10:29, Damian Guy <damian....@gmail.com> > >> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Eno, > >>>>>>>>>> > >>>>>>>>>> Sounds good to me. The only reason i can think of is if we want > >> to > >>> be > >>>>>>> able > >>>>>>>>>> to turn it off. > >>>>>>>>>> Gouzhang - thoughts? > >>>>>>>>>> > >>>>>>>>>> On Fri, 10 Feb 2017 at 10:28 Eno Thereska < > >> eno.there...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Question: if checkpointing is so cheap why not do it every > >> commit > >>>>>>>>>>> interval? That way we can get rid of this extra config variable > >>> and > >>>>>>> just > >>>>>>>>>>> use the existing commit interval. > >>>>>>>>>>> > >>>>>>>>>>> Less tuning knobs. > >>>>>>>>>>> > >>>>>>>>>>> Eno > >>>>>>>>>>> > >>>>>>>>>>>> On 10 Feb 2017, at 09:27, Damian Guy <damian....@gmail.com> > >>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Gouzhang, > >>>>>>>>>>>> > >>>>>>>>>>>> You've confused me. The failure scenarios you have described > >> are > >>>>> the > >>>>>>> same > >>>>>>>>>>>> as they are today. With the checkpoint files in place less > data > >>>>> will > >>>>>>> be > >>>>>>>>>>>> replayed, so there will be fewer duplicates. > >>>>>>>>>>>> > >>>>>>>>>>>> Are you saying you'd like the option to turn checkpointing > off? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Damian > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, 9 Feb 2017 at 21:55 Guozhang Wang <wangg...@gmail.com > > > >>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Eno, > >>>>>>>>>>>>> > >>>>>>>>>>>>> You are right, it is not a new scenario. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thinking a bit more on how we could incorporate KIP-98 in > >>>>> Streams, I > >>>>>>>>>>> feel > >>>>>>>>>>>>> that if EOS is turned on inside Streams, then we probably > >> cannot > >>>>>>> always > >>>>>>>>>>>>> resume from the checkpointed offsets as it is not guaranteed > >> to > >>> be > >>>>>>>>>>>>> "consistent"; but since EOS may not be turned on by default > >> this > >>>>> is > >>>>>>>>>>> still > >>>>>>>>>>>>> worthwhile to add this feature I guess. > >>>>>>>>>>>>> > >>>>>>>>>>>>> About the default config values: I think the default value of > >> 5 > >>>>> min > >>>>>>> is > >>>>>>>>>>> OK > >>>>>>>>>>>>> to me, since restoration is usually faster than normal > >>> processing > >>>>>>>>>>> (unless > >>>>>>>>>>>>> your traffic was really high), about allowing it to be > "turned > >>>>> off" > >>>>>>>>>>> with a > >>>>>>>>>>>>> non-positive value: I feel there are still values to keep > this > >>>>> door > >>>>>>>>>>> open as > >>>>>>>>>>>>> in the future if EOS is turned on, people may just want to > >> turn > >>>>> off > >>>>>>>>>>>>> checkpointing anyways, or there maybe other scenarios that we > >>> have > >>>>>>> not > >>>>>>>>>>>>> realized yet. On the other hand, I would argue that it is > less > >>>>>>> likely > >>>>>>>>>>> users > >>>>>>>>>>>>> mistakenly set it to a non-positive value. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Feb 9, 2017 at 1:03 PM, Eno Thereska < > >>>>>>> eno.there...@gmail.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Guozhang, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> It seems to me we have the same semantics today. Are you > >> saying > >>>>>>> there > >>>>>>>>>>> is > >>>>>>>>>>>>> a > >>>>>>>>>>>>>> new failure scenario? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Eno > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 9 Feb 2017, at 19:42, Guozhang Wang <wangg...@gmail.com > > > >>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> More specifically, here is my reasoning of failure cases, > >> and > >>>>>>> would > >>>>>>>>>>>>> like > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>> get your feedbacks: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> *StreamTask* > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> For stream-task, the committing order is 1) flush state > (may > >>>>> send > >>>>>>> more > >>>>>>>>>>>>>>> records to changelog in producer), 2) flush producer, 3) > >>> commit > >>>>>>>>>>>>> upstream > >>>>>>>>>>>>>>> offsets. My understanding is that the writing of the > >>> checkpoint > >>>>>>> file > >>>>>>>>>>>>> will > >>>>>>>>>>>>>>> between 2) and 3). So thatt he new order will be 1) flush > >>> state, > >>>>>>> 2) > >>>>>>>>>>>>> flush > >>>>>>>>>>>>>>> producer, 3) write checkpoint file (when necessary), 4) > >> commit > >>>>>>>>>>> upstream > >>>>>>>>>>>>>>> offsets. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> And we have a bunch of "changelog offsets" regarding the > >>> state: > >>>>> a) > >>>>>>>>>>>>> offset > >>>>>>>>>>>>>>> corresponding to the image of the persistent file, name it > >>> point > >>>>>>> A, b) > >>>>>>>>>>>>>> log > >>>>>>>>>>>>>>> end offset, name it offset B, c) checkpoint file recorded > >>>>> offset, > >>>>>>> name > >>>>>>>>>>>>> it > >>>>>>>>>>>>>>> offset C, d) offset corresponding to the current committed > >>>>>>> upstream > >>>>>>>>>>>>>> offset, > >>>>>>>>>>>>>>> name it offset D. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Now let's talk about the failure cases: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If there is a crash between 1) and 2), then A > B = C = D. > >> In > >>>>> this > >>>>>>>>>>>>> case, > >>>>>>>>>>>>>> if > >>>>>>>>>>>>>>> we restore, we will replay no logs at all since B = C while > >>> the > >>>>>>>>>>>>>> persistent > >>>>>>>>>>>>>>> state file is actually "ahead of time", and we will start > >>>>>>> reprocessing > >>>>>>>>>>>>>>> since from the input offset corresponding to D = B < A and > >>> hence > >>>>>>> have > >>>>>>>>>>>>>> some > >>>>>>>>>>>>>>> duplicated, *which will be incorrect* if the update logic > >>>>> involve > >>>>>>>>>>>>> reading > >>>>>>>>>>>>>>> the state store values as well (i.e. not a blind write), > >> e.g. > >>>>>>>>>>>>>> aggregations. > >>>>>>>>>>>>>>> If there is a crash between 2) and 3), then A = B > C = D. > >>> When > >>>>> we > >>>>>>>>>>>>>> restore, > >>>>>>>>>>>>>>> we will replay from C -> B = A, and then start reprocessing > >>> from > >>>>>>> input > >>>>>>>>>>>>>>> offset corresponding to D < A, and same issue applies as > >>> above. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If there is a crash between 3) and 4), then A = B = C > D. > >>> When > >>>>> we > >>>>>>>>>>>>>> restore, > >>>>>>>>>>>>>>> we will not replay, and then start reprocessing from input > >>>>> offset > >>>>>>>>>>>>>>> corresponding to D < A, and same issue applies as above. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> *StandbyTask* > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> We only do one operation today, which is 1) flush state, I > >>> think > >>>>>>> we > >>>>>>>>>>>>> will > >>>>>>>>>>>>>>> add the writing of the checkpoint file after it as step 2). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Failure cases again: offset A -> correspond to the image of > >>> the > >>>>>>> file, > >>>>>>>>>>>>>>> offset B -> changelog end offset, offset C -> written as in > >>> the > >>>>>>>>>>>>>> checkpoint > >>>>>>>>>>>>>>> file. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If there is a crash between 1) and 2), then B >= A > C (B > >>> = A > >>>>>>> because > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>> are reading from changelog topic so A will never be greater > >>> than > >>>>>>> B), > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1) and if this task resumes as a standby task, we will > >> resume > >>>>>>>>>>>>> restoration > >>>>>>>>>>>>>>> from offset C, and a few duplicates from C -> A will be > >>> applied > >>>>>>> again > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> local state files, then continue from A -> B, *this is OK* > >>> since > >>>>>>> they > >>>>>>>>>>>>> do > >>>>>>>>>>>>>>> not incur any computations hence no side effects and are > all > >>>>>>>>>>>>> idempotent. > >>>>>>>>>>>>>>> 2) and if this task resumes as a stream task, we will > replay > >>>>>>>>>>> changelogs > >>>>>>>>>>>>>>> from C -> A, with duplicated updates, and then from A -> B. > >>> This > >>>>>>> is > >>>>>>>>>>>>> also > >>>>>>>>>>>>>> OK > >>>>>>>>>>>>>>> for the same reason as above. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So it seems to me that this is not safe for a StreamTask, > or > >>>>>>> maybe the > >>>>>>>>>>>>>>> writing of the checkpoint file in your mind is different? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Thu, Feb 9, 2017 at 11:02 AM, Guozhang Wang < > >>>>>>> wangg...@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> A quick question re: `We will add the above config > >> parameter > >>> to > >>>>>>>>>>>>>>>> *StreamsConfig*. During *StreamTask#commit()*, > >>>>>>>>>>> *StandbyTask#commit()*, > >>>>>>>>>>>>>>>> and *GlobalUpdateStateTask#flushState()* we will check if > >> the > >>>>>>>>>>>>>> checkpoint > >>>>>>>>>>>>>>>> interval has elapsed and write the checkpoint file.` > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Will the writing of the checkpoint file happen before the > >>>>>>> flushing of > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> state manager? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Feb 9, 2017 at 10:48 AM, Matthias J. Sax < > >>>>>>>>>>>>> matth...@confluent.io > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> But 5 min means, that we (in the worst case) need to > reply > >>>>> data > >>>>>>> from > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> last 5 minutes to get the store ready. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> So why not go with the min possible value of 30 seconds > to > >>>>>>> speed up > >>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>> process if the impact is negligible anyway? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> What do you gain by being conservative? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 2/9/17 2:54 AM, Damian Guy wrote: > >>>>>>>>>>>>>>>>>> Why shouldn't it be 5 minutes? ;-) > >>>>>>>>>>>>>>>>>> It is a finger in the air number. Based on the testing i > >>> did > >>>>> it > >>>>>>>>>>>>> shows > >>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> there isn't much, if any, overhead when checkpointing a > >>>>> single > >>>>>>>>>>> store > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> commit interval. The default commit interval is 30 > >> seconds, > >>>>> so > >>>>>>> it > >>>>>>>>>>>>>> could > >>>>>>>>>>>>>>>>>> possibly be set to that. However, i'd prefer to be a > >> little > >>>>>>>>>>>>>>>>> conservative so > >>>>>>>>>>>>>>>>>> 5 minutes seemed reasonable. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, 9 Feb 2017 at 10:25 Michael Noll < > >>>>> mich...@confluent.io > >>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> Damian, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> could you elaborate briefly why the default value > should > >>> be > >>>>> 5 > >>>>>>>>>>>>>> minutes? > >>>>>>>>>>>>>>>>>>> What are the considerations, assumptions, etc. that go > >>> into > >>>>>>>>>>> picking > >>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>> value? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Right now, in the KIP and in this discussion, "5 mins" > >>> looks > >>>>>>> like > >>>>>>>>>>> a > >>>>>>>>>>>>>>>>> magic > >>>>>>>>>>>>>>>>>>> number to me. :-) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> -Michael > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Thu, Feb 9, 2017 at 11:03 AM, Damian Guy < > >>>>>>> damian....@gmail.com > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> I've ran the SimpleBenchmark with checkpoint on and > off > >>> to > >>>>>>> see > >>>>>>>>>>>>> what > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> impact is. It appears that there is very little > impact, > >>> if > >>>>>>> any. > >>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>> numbers > >>>>>>>>>>>>>>>>>>>> with checkpointing on actually look better, but that > is > >>>>>>> likely > >>>>>>>>>>>>>> largely > >>>>>>>>>>>>>>>>>>> due > >>>>>>>>>>>>>>>>>>>> to external influences. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> In any case, i'm going to suggest we go with a default > >>>>>>> checkpoint > >>>>>>>>>>>>>>>>>>> interval > >>>>>>>>>>>>>>>>>>>> of 5 minutes. I've update the KIP with this. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> commit every 10 seconds (no checkpoint) > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/34798/287372.83751939767/29.570664980746017 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/35942/278226.0308274442/28.62945857214401 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/34677/288375.58035585546/29.673847218617528 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/34677/288375.58035585546/29.673847218617528 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/31192/320595.02436522185/32.98922800718133 > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> checkpoint every 10 seconds (same as commit interval) > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/36997/270292.185852907/27.81306592426413 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/32087/311652.69423754164/32.069062237043035 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/32895/303997.5680194558/31.281349749202004 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/33476/298721.4720994145/30.738439479029754 > >>>>>>>>>>>>>>>>>>>> Streams Performance [records/latency/rec-sec/MB-sec > >>>>>>>>>>> source+store]: > >>>>>>>>>>>>>>>>>>>> 10000000/33196/301241.1133871551/30.99771056753826 > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Wed, 8 Feb 2017 at 09:02 Damian Guy < > >>>>> damian....@gmail.com > >>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> Matthias, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Fair point. I'll update it the KIP. > >>>>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Wed, 8 Feb 2017 at 05:49 Matthias J. Sax < > >>>>>>>>>>>>> matth...@confluent.io > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> Damian, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I am not strict about it either. However, if there is > >> no > >>>>>>>>>>>>> advantage > >>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>> disabling it, we might not want to allow it. This > >> would > >>>>>>> have the > >>>>>>>>>>>>>>>>>>>>> advantage to guard users to accidentally switch it > >> off. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 2/3/17 2:03 AM, Damian Guy wrote: > >>>>>>>>>>>>>>>>>>>>>> Hi Matthias, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> It possibly doesn't make sense to disable it, but > >> then > >>>>> i'm > >>>>>>> sure > >>>>>>>>>>>>>>>>>>> someone > >>>>>>>>>>>>>>>>>>>>>> will come up with a reason they don't want it! > >>>>>>>>>>>>>>>>>>>>>> I'm happy to change it such that the checkpoint > >>> interval > >>>>>>> must > >>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>> 0. > >>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>> Damian > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax < > >>>>>>>>>>>>>> matth...@confluent.io> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> Thanks Damian. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> One more question: "Checkpointing is disabled if > the > >>>>>>>>>>> checkpoint > >>>>>>>>>>>>>>>>>>>> interval > >>>>>>>>>>>>>>>>>>>>>>> is set to a value <=0." > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Does it make sense to disable check pointing? > What's > >>> the > >>>>>>>>>>>>> tradeoff > >>>>>>>>>>>>>>>>>>>> here? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On 2/2/17 1:51 AM, Damian Guy wrote: > >>>>>>>>>>>>>>>>>>>>>>>> Hi Matthias, > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the comments. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 1. TBD - i need to do some performance tests and > >> try > >>>>> and > >>>>>>> work > >>>>>>>>>>>>>> out > >>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>> sensible default. > >>>>>>>>>>>>>>>>>>>>>>>> 2. Yes, you are correct. It could be a multiple of > >>> the > >>>>>>>>>>>>>>>>>>>>>>> commit.interval.ms. > >>>>>>>>>>>>>>>>>>>>>>>> But, that would also mean if you change the commit > >>>>>>> interval - > >>>>>>>>>>>>>> say > >>>>>>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>>>>>>> lower > >>>>>>>>>>>>>>>>>>>>>>>> it, then you might also need to change the > >> checkpoint > >>>>>>> setting > >>>>>>>>>>>>>>>>> (i.e, > >>>>>>>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>>>>>>>> still only want to checkpoint every n minutes). > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 1 Feb 2017 at 23:46 Matthias J. Sax < > >>>>>>>>>>>>>>>>> matth...@confluent.io > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Damian. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I am wondering about two things: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> 1. what should be the default value for the new > >>>>>>> parameter? > >>>>>>>>>>>>>>>>>>>>>>>>> 2. why is the new parameter provided in ms? > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> About (2): because > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> "the minimum checkpoint interval will be the > value > >>> of > >>>>>>>>>>>>>>>>>>>>>>>>> commit.interval.ms. In effect the actual > >> checkpoint > >>>>>>>>>>> interval > >>>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>>> multiple of the commit interval" > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> it might be easier to just use an parameter that > >> is > >>>>>>>>>>>>>>>>>>>> "number-or-commit > >>>>>>>>>>>>>>>>>>>>>>>>> intervals". > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On 2/1/17 7:29 AM, Damian Guy wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the comments Eno. > >>>>>>>>>>>>>>>>>>>>>>>>>> As for exactly once, i don't believe this > matters > >>> as > >>>>>>> we are > >>>>>>>>>>>>>> just > >>>>>>>>>>>>>>>>>>>>>>>>> restoring > >>>>>>>>>>>>>>>>>>>>>>>>>> the change-log, i.e, the result of the > >> aggregations > >>>>>>> that > >>>>>>>>>>>>>>>>>>> previously > >>>>>>>>>>>>>>>>>>>>> ran > >>>>>>>>>>>>>>>>>>>>>>>>>> etc. So once initialized the state store will be > >> in > >>>>> the > >>>>>>>>>>> same > >>>>>>>>>>>>>>>>>>> state > >>>>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>>>>>> was before. > >>>>>>>>>>>>>>>>>>>>>>>>>> Having the checkpoint in a kafka topic is not > >> ideal > >>>>> as > >>>>>>> the > >>>>>>>>>>>>>> state > >>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>> per > >>>>>>>>>>>>>>>>>>>>>>>>>> kafka streams instance. So each instance would > >> need > >>>>> to > >>>>>>>>>>> start > >>>>>>>>>>>>>>>>>>> with a > >>>>>>>>>>>>>>>>>>>>>>>>> unique > >>>>>>>>>>>>>>>>>>>>>>>>>> id that is persistent. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>>>>>> Damian > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 1 Feb 2017 at 13:20 Eno Thereska < > >>>>>>>>>>>>>>>>> eno.there...@gmail.com > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>> As a follow up to my previous comment, have you > >>>>>>> thought > >>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>>>> writing > >>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint to a topic instead of a local file? > >>> That > >>>>>>> would > >>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>> advantage that all metadata continues to be > >>> managed > >>>>> by > >>>>>>>>>>>>> Kafka, > >>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>> well > >>>>>>>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>>>>>>> fit with EoS. The potential disadvantage would > >> be > >>> a > >>>>>>> slower > >>>>>>>>>>>>>>>>>>>> latency, > >>>>>>>>>>>>>>>>>>>>>>>>> however > >>>>>>>>>>>>>>>>>>>>>>>>>>> if it is periodic as you mention, I'm not sure > >>> that > >>>>>>> would > >>>>>>>>>>>>> be > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>> show > >>>>>>>>>>>>>>>>>>>>>>>>> stopper. > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>>>>>>>>>> Eno > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 1 Feb 2017, at 12:58, Eno Thereska < > >>>>>>>>>>>>>> eno.there...@gmail.com > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Damian, this is a good idea and will > >>> reduce > >>>>>>> the > >>>>>>>>>>>>>> restore > >>>>>>>>>>>>>>>>>>>>> time. > >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward, with exactly once and support > >> for > >>>>>>>>>>>>>> transactions > >>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka, I > >>>>>>>>>>>>>>>>>>>>>>>>>>> believe we'll have to add some support for > >> rolling > >>>>>>> back > >>>>>>>>>>>>>>>>>>>> checkpoints, > >>>>>>>>>>>>>>>>>>>>>>>>> e.g., > >>>>>>>>>>>>>>>>>>>>>>>>>>> when a transaction is aborted. We need to be > >> aware > >>>>> of > >>>>>>> that > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>> ideally > >>>>>>>>>>>>>>>>>>>>>>>>>>> anticipate a bit those needs in the KIP. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Eno > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 1 Feb 2017, at 10:18, Damian Guy < > >>>>>>>>>>>>> damian....@gmail.com> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start the discussion on > >> KIP-116: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/ > >> confluence/display/KAFKA/KIP- > >>>>>>>>>>>>>>>>>>>> 116+-+Add+State+Store+Checkpoint+Interval+ > >> Configuration > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Damian > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> DISCLAIMER > >>>>>>>> ========== > >>>>>>>> This e-mail may contain privileged and confidential information > >> which > >>>>> is > >>>>>>> the property of Persistent Systems Ltd. It is intended only for the > >>> use > >>>>> of > >>>>>>> the individual or entity to which it is addressed. If you are not > >> the > >>>>>>> intended recipient, you are not authorized to read, retain, copy, > >>> print, > >>>>>>> distribute or use this message. If you have received this > >>> communication > >>>>> in > >>>>>>> error, please notify the sender and delete all copies of this > >> message. > >>>>>>> Persistent Systems Ltd. does not accept any liability for virus > >>> infected > >>>>>>> mails. > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> -- Guozhang > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > > > > -- > > -- Guozhang > >