+1 very useful during tuning and ongoing monitoring for cost of
checkpointing (both, serialization and io). Can also be used to identify
skew.

--
sent from mobile
On Sep 25, 2016 9:10 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:

> We've seen  cases where operator state continues to grow without bound
> either because
> the developer was unaware of the importance of keeping state small or
> because of some
> anomaly downstream. In such cases, the operators could get killed with an
> OOM exception because
> these checkpoints are building up in memory faster than they can be written
> to disk.
>
> These stats may be useful in such cases to identify the root cause of
> failure.
>
> Ram
>
> On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde <sand...@datatorrent.com>
> wrote:
>
> > Say it takes x MB size and y seconds to do the checkpoint. What does the
> > user do with that information?
> >
> > On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi <tus...@datatorrent.com>
> > wrote:
> >
> > > +1
> > >
> > > -Tushar
> > >
> > > On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare <san...@datatorrent.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Sanjay
> > > >
> > > >
> > > > On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare <
> > > > devend...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Thanks,
> > > > > Dev
> > > > >
> > > > > On Sep 25, 2016 1:17 AM, "Pramod Immaneni" <pra...@datatorrent.com
> >
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > > On Sep 24, 2016, at 10:01 AM, Vlad Rozov <
> > v.ro...@datatorrent.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > IMO, it may be useful to provide checkpoint statistics for
> > example,
> > > > > > total size of checkpoint for particular window or average size of
> > > > > > checkpoints for a particular operator. Also, how long it takes to
> > > write
> > > > > > checkpoints to storage.
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > > > Vlad
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to