“move the functions you are passing” yes this is what I had already done – and 
what I hope to avoid

Thank you however for the reminder about @transient – with that I am able to 
create a function value that includes the non-serializable state as a 
@transient val. Which at least packages the solution closer to the code that 
causes the problem.

Cheers
Simon

From: Evan Sparks [mailto:evan.spa...@gmail.com]
Sent: 24 June 2016 16:12
To: Simon Scott <simon.sc...@viavisolutions.com>
Cc: dev@spark.apache.org
Subject: Re: Associating user objects with SparkContext/SparkStreamingContext

I would actually think about this the other way around. Move the functions you 
are passing to the streaming jobs out to their own object if possible. Spark's 
closure capture rules are necessarily far reaching and serialize the object 
that contains these methods, which is a common cause of the problem you're 
seeing.

Another option is to mark the non-serializable state as "@transient" if it is 
never accessed by the worker processes.

On Jun 24, 2016, at 1:23 AM, Simon Scott 
<simon.sc...@viavisolutions.com<mailto:simon.sc...@viavisolutions.com>> wrote:
Hi,

I am developing a streaming application using checkpointing on Spark 1.5.1

I have just run into a NotSerializableException because some of the state that 
my streaming functions need cannot be serialized. This state is only used in 
the driver process, it is the checkpointing that requires the serialization.

So I am considering moving that state into a Scala “object” – i.e. global 
singleton that must be mutable to allow the state to be set at application 
start.

I would prefer to be able to create immutable state and attach it to either the 
SparkContext or SparkStreamingContext but I can’t find an api for that.

Does anybody else think is a good idea? Is there a better way? Or would such an 
api be a useful enhancement to Spark?

Thanks in advance
Simon

Research Developer
Viavi Solutions

Reply via email to