A quick question re: `We will add the above config parameter to *StreamsConfig*. During *StreamTask#commit()*, *StandbyTask#commit()*, and *GlobalUpdateStateTask#flushState()* we will check if the checkpoint interval has elapsed and write the checkpoint file.`
Will the writing of the checkpoint file happen before the flushing of the state manager? Guozhang On Thu, Feb 9, 2017 at 10:48 AM, Matthias J. Sax <matth...@confluent.io> wrote: > But 5 min means, that we (in the worst case) need to reply data from the > last 5 minutes to get the store ready. > > So why not go with the min possible value of 30 seconds to speed up this > process if the impact is negligible anyway? > > What do you gain by being conservative? > > > -Matthias > > On 2/9/17 2:54 AM, Damian Guy wrote: > > Why shouldn't it be 5 minutes? ;-) > > It is a finger in the air number. Based on the testing i did it shows > that > > there isn't much, if any, overhead when checkpointing a single store on > the > > commit interval. The default commit interval is 30 seconds, so it could > > possibly be set to that. However, i'd prefer to be a little conservative > so > > 5 minutes seemed reasonable. > > > > > > On Thu, 9 Feb 2017 at 10:25 Michael Noll <mich...@confluent.io> wrote: > > > >> Damian, > >> > >> could you elaborate briefly why the default value should be 5 minutes? > >> What are the considerations, assumptions, etc. that go into picking this > >> value? > >> > >> Right now, in the KIP and in this discussion, "5 mins" looks like a > magic > >> number to me. :-) > >> > >> -Michael > >> > >> > >> > >> On Thu, Feb 9, 2017 at 11:03 AM, Damian Guy <damian....@gmail.com> > wrote: > >> > >>> I've ran the SimpleBenchmark with checkpoint on and off to see what the > >>> impact is. It appears that there is very little impact, if any. The > >> numbers > >>> with checkpointing on actually look better, but that is likely largely > >> due > >>> to external influences. > >>> > >>> In any case, i'm going to suggest we go with a default checkpoint > >> interval > >>> of 5 minutes. I've update the KIP with this. > >>> > >>> commit every 10 seconds (no checkpoint) > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/34798/287372.83751939767/29.570664980746017 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/35942/278226.0308274442/28.62945857214401 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/34677/288375.58035585546/29.673847218617528 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/34677/288375.58035585546/29.673847218617528 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/31192/320595.02436522185/32.98922800718133 > >>> > >>> > >>> checkpoint every 10 seconds (same as commit interval) > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/36997/270292.185852907/27.81306592426413 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/32087/311652.69423754164/32.069062237043035 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/32895/303997.5680194558/31.281349749202004 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/33476/298721.4720994145/30.738439479029754 > >>> Streams Performance [records/latency/rec-sec/MB-sec source+store]: > >>> 10000000/33196/301241.1133871551/30.99771056753826 > >>> > >>> On Wed, 8 Feb 2017 at 09:02 Damian Guy <damian....@gmail.com> wrote: > >>> > >>>> Matthias, > >>>> > >>>> Fair point. I'll update it the KIP. > >>>> Thanks > >>>> > >>>> On Wed, 8 Feb 2017 at 05:49 Matthias J. Sax <matth...@confluent.io> > >>> wrote: > >>>> > >>>> Damian, > >>>> > >>>> I am not strict about it either. However, if there is no advantage in > >>>> disabling it, we might not want to allow it. This would have the > >>>> advantage to guard users to accidentally switch it off. > >>>> > >>>> -Matthias > >>>> > >>>> > >>>> On 2/3/17 2:03 AM, Damian Guy wrote: > >>>>> Hi Matthias, > >>>>> > >>>>> It possibly doesn't make sense to disable it, but then i'm sure > >> someone > >>>>> will come up with a reason they don't want it! > >>>>> I'm happy to change it such that the checkpoint interval must be > 0. > >>>>> > >>>>> Cheers, > >>>>> Damian > >>>>> > >>>>> On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax <matth...@confluent.io> > >>>> wrote: > >>>>> > >>>>>> Thanks Damian. > >>>>>> > >>>>>> One more question: "Checkpointing is disabled if the checkpoint > >>> interval > >>>>>> is set to a value <=0." > >>>>>> > >>>>>> > >>>>>> Does it make sense to disable check pointing? What's the tradeoff > >>> here? > >>>>>> > >>>>>> > >>>>>> -Matthias > >>>>>> > >>>>>> > >>>>>> On 2/2/17 1:51 AM, Damian Guy wrote: > >>>>>>> Hi Matthias, > >>>>>>> > >>>>>>> Thanks for the comments. > >>>>>>> > >>>>>>> 1. TBD - i need to do some performance tests and try and work out a > >>>>>>> sensible default. > >>>>>>> 2. Yes, you are correct. It could be a multiple of the > >>>>>> commit.interval.ms. > >>>>>>> But, that would also mean if you change the commit interval - say > >> you > >>>>>> lower > >>>>>>> it, then you might also need to change the checkpoint setting (i.e, > >>> you > >>>>>>> still only want to checkpoint every n minutes). > >>>>>>> > >>>>>>> On Wed, 1 Feb 2017 at 23:46 Matthias J. Sax <matth...@confluent.io > >>> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Thanks for the KIP Damian. > >>>>>>>> > >>>>>>>> I am wondering about two things: > >>>>>>>> > >>>>>>>> 1. what should be the default value for the new parameter? > >>>>>>>> 2. why is the new parameter provided in ms? > >>>>>>>> > >>>>>>>> About (2): because > >>>>>>>> > >>>>>>>> "the minimum checkpoint interval will be the value of > >>>>>>>> commit.interval.ms. In effect the actual checkpoint interval will > >>> be > >>>> a > >>>>>>>> multiple of the commit interval" > >>>>>>>> > >>>>>>>> it might be easier to just use an parameter that is > >>> "number-or-commit > >>>>>>>> intervals". > >>>>>>>> > >>>>>>>> > >>>>>>>> -Matthias > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2/1/17 7:29 AM, Damian Guy wrote: > >>>>>>>>> Thanks for the comments Eno. > >>>>>>>>> As for exactly once, i don't believe this matters as we are just > >>>>>>>> restoring > >>>>>>>>> the change-log, i.e, the result of the aggregations that > >> previously > >>>> ran > >>>>>>>>> etc. So once initialized the state store will be in the same > >> state > >>> as > >>>>>> it > >>>>>>>>> was before. > >>>>>>>>> Having the checkpoint in a kafka topic is not ideal as the state > >> is > >>>> per > >>>>>>>>> kafka streams instance. So each instance would need to start > >> with a > >>>>>>>> unique > >>>>>>>>> id that is persistent. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Damian > >>>>>>>>> > >>>>>>>>> On Wed, 1 Feb 2017 at 13:20 Eno Thereska <eno.there...@gmail.com > >>> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> As a follow up to my previous comment, have you thought about > >>>> writing > >>>>>>>> the > >>>>>>>>>> checkpoint to a topic instead of a local file? That would have > >> the > >>>>>>>>>> advantage that all metadata continues to be managed by Kafka, as > >>>> well > >>>>>> as > >>>>>>>>>> fit with EoS. The potential disadvantage would be a slower > >>> latency, > >>>>>>>> however > >>>>>>>>>> if it is periodic as you mention, I'm not sure that would be a > >>> show > >>>>>>>> stopper. > >>>>>>>>>> > >>>>>>>>>> Thanks > >>>>>>>>>> Eno > >>>>>>>>>>> On 1 Feb 2017, at 12:58, Eno Thereska <eno.there...@gmail.com> > >>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Thanks Damian, this is a good idea and will reduce the restore > >>>> time. > >>>>>>>>>> Looking forward, with exactly once and support for transactions > >> in > >>>>>>>> Kafka, I > >>>>>>>>>> believe we'll have to add some support for rolling back > >>> checkpoints, > >>>>>>>> e.g., > >>>>>>>>>> when a transaction is aborted. We need to be aware of that and > >>>> ideally > >>>>>>>>>> anticipate a bit those needs in the KIP. > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> Eno > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On 1 Feb 2017, at 10:18, Damian Guy <damian....@gmail.com> > >>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> I would like to start the discussion on KIP-116: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>> 116+-+Add+State+Store+Checkpoint+Interval+Configuration > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Damian > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > > > > -- -- Guozhang