+1 very useful during tuning and ongoing monitoring for cost of checkpointing (both, serialization and io). Can also be used to identify skew.
-- sent from mobile On Sep 25, 2016 9:10 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote: > We've seen cases where operator state continues to grow without bound > either because > the developer was unaware of the importance of keeping state small or > because of some > anomaly downstream. In such cases, the operators could get killed with an > OOM exception because > these checkpoints are building up in memory faster than they can be written > to disk. > > These stats may be useful in such cases to identify the root cause of > failure. > > Ram > > On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde <sand...@datatorrent.com> > wrote: > > > Say it takes x MB size and y seconds to do the checkpoint. What does the > > user do with that information? > > > > On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi <tus...@datatorrent.com> > > wrote: > > > > > +1 > > > > > > -Tushar > > > > > > On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare <san...@datatorrent.com> > > > wrote: > > > > > > > +1 > > > > > > > > Sanjay > > > > > > > > > > > > On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare < > > > > devend...@datatorrent.com> > > > > wrote: > > > > > > > > > +1 > > > > > > > > > > Thanks, > > > > > Dev > > > > > > > > > > On Sep 25, 2016 1:17 AM, "Pramod Immaneni" <pra...@datatorrent.com > > > > > > wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > > On Sep 24, 2016, at 10:01 AM, Vlad Rozov < > > v.ro...@datatorrent.com> > > > > > > wrote: > > > > > > > > > > > > > > IMO, it may be useful to provide checkpoint statistics for > > example, > > > > > > total size of checkpoint for particular window or average size of > > > > > > checkpoints for a particular operator. Also, how long it takes to > > > write > > > > > > checkpoints to storage. > > > > > > > > > > > > > > Thank you, > > > > > > > > > > > > > > Vlad > > > > > > > > > > > > > > > > > > > > >