+1 for this feature. The size and time to checkpoint the state at operator level will help in tuning and understanding the overheads if any.
-Venkatesh. > On Sep 25, 2016, at 10:56 PM, Chinmay Kolhatkar <chin...@datatorrent.com> > wrote: > > +1. very useful feature. We should also provide doc on how to use that > information for tuning. > > On Sun, Sep 25, 2016 at 11:27 PM, Thomas Weise <thomas.we...@gmail.com> > wrote: > >> +1 very useful during tuning and ongoing monitoring for cost of >> checkpointing (both, serialization and io). Can also be used to identify >> skew. >> >> -- >> sent from mobile >> On Sep 25, 2016 9:10 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote: >> >>> We've seen cases where operator state continues to grow without bound >>> either because >>> the developer was unaware of the importance of keeping state small or >>> because of some >>> anomaly downstream. In such cases, the operators could get killed with an >>> OOM exception because >>> these checkpoints are building up in memory faster than they can be >> written >>> to disk. >>> >>> These stats may be useful in such cases to identify the root cause of >>> failure. >>> >>> Ram >>> >>> On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde <sand...@datatorrent.com> >>> wrote: >>> >>>> Say it takes x MB size and y seconds to do the checkpoint. What does >> the >>>> user do with that information? >>>> >>>> On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi <tus...@datatorrent.com> >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> -Tushar >>>>> >>>>> On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare <san...@datatorrent.com> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Sanjay >>>>>> >>>>>> >>>>>> On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare < >>>>>> devend...@datatorrent.com> >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Thanks, >>>>>>> Dev >>>>>>> >>>>>>> On Sep 25, 2016 1:17 AM, "Pramod Immaneni" < >> pra...@datatorrent.com >>>> >>>>>> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>>> On Sep 24, 2016, at 10:01 AM, Vlad Rozov < >>>> v.ro...@datatorrent.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> IMO, it may be useful to provide checkpoint statistics for >>>> example, >>>>>>>> total size of checkpoint for particular window or average size >> of >>>>>>>> checkpoints for a particular operator. Also, how long it takes >> to >>>>> write >>>>>>>> checkpoints to storage. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Vlad >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>