+1 for this feature. The size and time to checkpoint the state at operator level will help in tuning and understanding the overheads if any.
-Venkatesh. > On Sep 25, 2016, at 10:56 PM, Chinmay Kolhatkar <[email protected]> > wrote: > > +1. very useful feature. We should also provide doc on how to use that > information for tuning. > > On Sun, Sep 25, 2016 at 11:27 PM, Thomas Weise <[email protected]> > wrote: > >> +1 very useful during tuning and ongoing monitoring for cost of >> checkpointing (both, serialization and io). Can also be used to identify >> skew. >> >> -- >> sent from mobile >> On Sep 25, 2016 9:10 AM, "Munagala Ramanath" <[email protected]> wrote: >> >>> We've seen cases where operator state continues to grow without bound >>> either because >>> the developer was unaware of the importance of keeping state small or >>> because of some >>> anomaly downstream. In such cases, the operators could get killed with an >>> OOM exception because >>> these checkpoints are building up in memory faster than they can be >> written >>> to disk. >>> >>> These stats may be useful in such cases to identify the root cause of >>> failure. >>> >>> Ram >>> >>> On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde <[email protected]> >>> wrote: >>> >>>> Say it takes x MB size and y seconds to do the checkpoint. What does >> the >>>> user do with that information? >>>> >>>> On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi <[email protected]> >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> -Tushar >>>>> >>>>> On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare <[email protected]> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Sanjay >>>>>> >>>>>> >>>>>> On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Thanks, >>>>>>> Dev >>>>>>> >>>>>>> On Sep 25, 2016 1:17 AM, "Pramod Immaneni" < >> [email protected] >>>> >>>>>> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>>> On Sep 24, 2016, at 10:01 AM, Vlad Rozov < >>>> [email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> IMO, it may be useful to provide checkpoint statistics for >>>> example, >>>>>>>> total size of checkpoint for particular window or average size >> of >>>>>>>> checkpoints for a particular operator. Also, how long it takes >> to >>>>> write >>>>>>>> checkpoints to storage. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Vlad >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
