Re: Why do checkpoints work the way they do?

2017-09-12 Thread Hugo Reinwald
Thanks Tathagata for the clarification. +1 on documenting limitations of
checkpoints on structured streaming.

Hugo

On Mon, Sep 11, 2017 at 7:13 PM, Dmitry Naumenko 
wrote:

> +1 for me for this question. If there any constraints in restoring
> checkpoint for Structured Streaming, they should be documented.
>
>
> 2017-08-31 9:20 GMT+03:00 张万新 :
>
>> So is there any documents demonstrating in what condition can my
>> application recover from the same checkpoint and in what condition not?
>>
>> Tathagata Das 于2017年8月30日周三 下午1:20写道:
>>
>>> Hello,
>>>
>>> This is an unfortunate design on my part when I was building DStreams :)
>>>
>>> Fortunately, we learnt from our mistakes and built Structured Streaming
>>> the correct way. Checkpointing in Structured Streaming stores only the
>>> progress information (offsets, etc.), and the user can change their
>>> application code (within certain constraints, of course) and still restart
>>> from checkpoints (unlike DStreams). If you are just building out your
>>> streaming applications, then I highly recommend you to try out Structured
>>> Streaming instead of DStreams (which is effectively in maintenance mode).
>>>
>>>
>>> On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald 
>>> wrote:
>>>
 Hello,

 I am implementing a spark streaming solution with Kafka and read that
 checkpoints cannot be used across application code changes - here
 

 I tested changes in application code and got the error message as b
 below -

 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from
 file file:/tmp/checkpoint/checkpoint-150364116.bk
 java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer;
 local class incompatible: stream classdesc serialVersionUID =
 -2927962711774871866, local class serialVersionUID = 1529165946227428979

 While I understand that this is as per design, can I know why does
 checkpointing work the way that it does verifying the class signatures?
 Would it not be easier to let the developer decide if he/she wants to use
 the old checkpoints depending on what is the change in application logic
 e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc

 This is first post in the group. Apologies if I am asking the question
 again, I did a nabble search and it didnt throw up the answer.

 Thanks for the help.
 Hugo

>>>
>>>
>


Re: Why do checkpoints work the way they do?

2017-09-11 Thread Dmitry Naumenko
+1 for me for this question. If there any constraints in restoring
checkpoint for Structured Streaming, they should be documented.


2017-08-31 9:20 GMT+03:00 张万新 :

> So is there any documents demonstrating in what condition can my
> application recover from the same checkpoint and in what condition not?
>
> Tathagata Das 于2017年8月30日周三 下午1:20写道:
>
>> Hello,
>>
>> This is an unfortunate design on my part when I was building DStreams :)
>>
>> Fortunately, we learnt from our mistakes and built Structured Streaming
>> the correct way. Checkpointing in Structured Streaming stores only the
>> progress information (offsets, etc.), and the user can change their
>> application code (within certain constraints, of course) and still restart
>> from checkpoints (unlike DStreams). If you are just building out your
>> streaming applications, then I highly recommend you to try out Structured
>> Streaming instead of DStreams (which is effectively in maintenance mode).
>>
>>
>> On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald 
>> wrote:
>>
>>> Hello,
>>>
>>> I am implementing a spark streaming solution with Kafka and read that
>>> checkpoints cannot be used across application code changes - here
>>> 
>>>
>>> I tested changes in application code and got the error message as b
>>> below -
>>>
>>> 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from
>>> file file:/tmp/checkpoint/checkpoint-150364116.bk
>>> java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer;
>>> local class incompatible: stream classdesc serialVersionUID =
>>> -2927962711774871866, local class serialVersionUID = 1529165946227428979
>>>
>>> While I understand that this is as per design, can I know why does
>>> checkpointing work the way that it does verifying the class signatures?
>>> Would it not be easier to let the developer decide if he/she wants to use
>>> the old checkpoints depending on what is the change in application logic
>>> e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc
>>>
>>> This is first post in the group. Apologies if I am asking the question
>>> again, I did a nabble search and it didnt throw up the answer.
>>>
>>> Thanks for the help.
>>> Hugo
>>>
>>
>>


Re: Why do checkpoints work the way they do?

2017-08-31 Thread 张万新
So is there any documents demonstrating in what condition can my
application recover from the same checkpoint and in what condition not?

Tathagata Das 于2017年8月30日周三 下午1:20写道:

> Hello,
>
> This is an unfortunate design on my part when I was building DStreams :)
>
> Fortunately, we learnt from our mistakes and built Structured Streaming
> the correct way. Checkpointing in Structured Streaming stores only the
> progress information (offsets, etc.), and the user can change their
> application code (within certain constraints, of course) and still restart
> from checkpoints (unlike DStreams). If you are just building out your
> streaming applications, then I highly recommend you to try out Structured
> Streaming instead of DStreams (which is effectively in maintenance mode).
>
>
> On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald 
> wrote:
>
>> Hello,
>>
>> I am implementing a spark streaming solution with Kafka and read that
>> checkpoints cannot be used across application code changes - here
>> 
>>
>> I tested changes in application code and got the error message as b below
>> -
>>
>> 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from
>> file file:/tmp/checkpoint/checkpoint-150364116.bk
>> java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer;
>> local class incompatible: stream classdesc serialVersionUID =
>> -2927962711774871866, local class serialVersionUID = 1529165946227428979
>>
>> While I understand that this is as per design, can I know why does
>> checkpointing work the way that it does verifying the class signatures?
>> Would it not be easier to let the developer decide if he/she wants to use
>> the old checkpoints depending on what is the change in application logic
>> e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc
>>
>> This is first post in the group. Apologies if I am asking the question
>> again, I did a nabble search and it didnt throw up the answer.
>>
>> Thanks for the help.
>> Hugo
>>
>
>


Re: Why do checkpoints work the way they do?

2017-08-29 Thread Tathagata Das
Hello,

This is an unfortunate design on my part when I was building DStreams :)

Fortunately, we learnt from our mistakes and built Structured Streaming the
correct way. Checkpointing in Structured Streaming stores only the progress
information (offsets, etc.), and the user can change their application code
(within certain constraints, of course) and still restart from checkpoints
(unlike DStreams). If you are just building out your streaming
applications, then I highly recommend you to try out Structured Streaming
instead of DStreams (which is effectively in maintenance mode).


On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald 
wrote:

> Hello,
>
> I am implementing a spark streaming solution with Kafka and read that
> checkpoints cannot be used across application code changes - here
> 
>
> I tested changes in application code and got the error message as b below
> -
>
> 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from
> file file:/tmp/checkpoint/checkpoint-150364116.bk
> java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer;
> local class incompatible: stream classdesc serialVersionUID =
> -2927962711774871866, local class serialVersionUID = 1529165946227428979
>
> While I understand that this is as per design, can I know why does
> checkpointing work the way that it does verifying the class signatures?
> Would it not be easier to let the developer decide if he/she wants to use
> the old checkpoints depending on what is the change in application logic
> e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc
>
> This is first post in the group. Apologies if I am asking the question
> again, I did a nabble search and it didnt throw up the answer.
>
> Thanks for the help.
> Hugo
>


Why do checkpoints work the way they do?

2017-08-25 Thread Hugo Reinwald
Hello,

I am implementing a spark streaming solution with Kafka and read that
checkpoints cannot be used across application code changes - here


I tested changes in application code and got the error message as b below -

17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from file
file:/tmp/checkpoint/checkpoint-150364116.bk
java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer; local
class incompatible: stream classdesc serialVersionUID =
-2927962711774871866, local class serialVersionUID = 1529165946227428979

While I understand that this is as per design, can I know why does
checkpointing work the way that it does verifying the class signatures?
Would it not be easier to let the developer decide if he/she wants to use
the old checkpoints depending on what is the change in application logic
e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc

This is first post in the group. Apologies if I am asking the question
again, I did a nabble search and it didnt throw up the answer.

Thanks for the help.
Hugo