+1 for this feature. The size and time to checkpoint the state at operator 
level will help in tuning and understanding the overheads if any.


-Venkatesh.

> On Sep 25, 2016, at 10:56 PM, Chinmay Kolhatkar <chin...@datatorrent.com> 
> wrote:
> 
> +1. very useful feature. We should also provide doc on how to use that
> information for tuning.
> 
> On Sun, Sep 25, 2016 at 11:27 PM, Thomas Weise <thomas.we...@gmail.com>
> wrote:
> 
>> +1 very useful during tuning and ongoing monitoring for cost of
>> checkpointing (both, serialization and io). Can also be used to identify
>> skew.
>> 
>> --
>> sent from mobile
>> On Sep 25, 2016 9:10 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
>> 
>>> We've seen  cases where operator state continues to grow without bound
>>> either because
>>> the developer was unaware of the importance of keeping state small or
>>> because of some
>>> anomaly downstream. In such cases, the operators could get killed with an
>>> OOM exception because
>>> these checkpoints are building up in memory faster than they can be
>> written
>>> to disk.
>>> 
>>> These stats may be useful in such cases to identify the root cause of
>>> failure.
>>> 
>>> Ram
>>> 
>>> On Sun, Sep 25, 2016 at 7:39 AM, Sandesh Hegde <sand...@datatorrent.com>
>>> wrote:
>>> 
>>>> Say it takes x MB size and y seconds to do the checkpoint. What does
>> the
>>>> user do with that information?
>>>> 
>>>> On Sun, Sep 25, 2016, 6:51 AM Tushar Gosavi <tus...@datatorrent.com>
>>>> wrote:
>>>> 
>>>>> +1
>>>>> 
>>>>> -Tushar
>>>>> 
>>>>> On Sun, Sep 25, 2016, 8:54 AM Sanjay Pujare <san...@datatorrent.com>
>>>>> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> Sanjay
>>>>>> 
>>>>>> 
>>>>>> On Sun, Sep 25, 2016 at 7:06 AM, Devendra Tagare <
>>>>>> devend...@datatorrent.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Dev
>>>>>>> 
>>>>>>> On Sep 25, 2016 1:17 AM, "Pramod Immaneni" <
>> pra...@datatorrent.com
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>>> On Sep 24, 2016, at 10:01 AM, Vlad Rozov <
>>>> v.ro...@datatorrent.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> IMO, it may be useful to provide checkpoint statistics for
>>>> example,
>>>>>>>> total size of checkpoint for particular window or average size
>> of
>>>>>>>> checkpoints for a particular operator. Also, how long it takes
>> to
>>>>> write
>>>>>>>> checkpoints to storage.
>>>>>>>>> 
>>>>>>>>> Thank you,
>>>>>>>>> 
>>>>>>>>> Vlad
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to