Re: Spark and Stanford CoreNLP

Christopher Manning Tue, 25 Nov 2014 08:06:01 -0800

I’m not (yet!) an active Spark user, but saw this thread on twitter … and am 
involved with Stanford CoreNLP.


Could someone explain how things need to be to work better with Spark — since 
that would be a useful goal.

That is, while Stanford CoreNLP is not quite uniform (being developed by 
various people for over a decade), the general approach has always been that 
models should be serializable but that processors should not be. This make 
sense to me intuitively. It doesn’t really make sense to serialize a processor, 
which often has large mutable data structures used for processing.

But does that not work well with Spark? Do processors need to be serializable, 
and then one needs to go through and make all the elements of the processor 
transient?

Or what?

Thanks!

Chris


> On Nov 25, 2014, at 7:54 AM, Evan Sparks <evan.spa...@gmail.com> wrote:
> 
> If you only mark it as transient, then the object won't be serialized, and on 
> the worker the field will be null. When the worker goes to use it, you get an 
> NPE. 
> 
> Marking it lazy defers initialization to first use. If that use happens to be 
> after serialization time (e.g. on the worker), then the worker will first 
> check to see if it's initialized, and then initialize it if not. 
> 
> I think if you *do* reference the lazy val before serializing you will likely 
> get an NPE. 
> 
> 
>> On Nov 25, 2014, at 1:05 AM, Theodore Vasiloudis 
>> <theodoros.vasilou...@gmail.com> wrote:
>> 
>> Great, Ian's approach seems to work fine.
>> 
>> Can anyone provide an explanation as to why this works, but passing the
>> CoreNLP object itself
>> as transient does not?
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Stanford-CoreNLP-tp19654p19739.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark and Stanford CoreNLP

Reply via email to