[ 
https://issues.apache.org/jira/browse/SPARK-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303030#comment-15303030
 ] 

Catalin Alexandru Zamfir edited comment on SPARK-15582 at 5/26/16 10:08 PM:
----------------------------------------------------------------------------

Nope. Well, I'm trying to understand how/if that's possible, if a solution 
exists or if it doesn't fit the picture alltogheter. If Java 8 Lamdas work, 
then closures which are based on the same SAM principle, should also work, no? 
SPARK-2171 advertises that Groovy closures work, but most probably the tests 
were done in local[] mode, where the classes do exist.

I see the Groovy script executing the first stage (cache) which only invokes 
Java-code, but when the second step needs executing (with the flatMap closure 
written in Groovy) ... that doesn't translate to something the executors can 
understand/find.

Solutions of sending the text/byte-code around to make the inner classes of the 
script visible to the executors at run-time are also what I'm thinking, but I 
see them as work-arounds rather than first-class citizens of the framework. I'd 
like to help build this support if it does not exist in Spark, or at least 
document how it's possible to make them work.

For this however I need some guidance on where to look, what to hack at to try 
to make it work :) ...


was (Author: antauri):
Nope. Well, I'm trying to understand how/if that's possible, if a solution 
exists or if it doesn't fit the picture alltogheter. If Java 8 Lamdas work, 
then closures which are based on the same SAM principle, should also work, no? 

I see the Groovy script executing the first stage (cache) which only invokes 
Java-code, but when the second step needs executing (with the flatMap closure 
written in Groovy) ... that doesn't translate to something the executors can 
understand/find.

Solutions of sending the text/byte-code around to make the inner classes of the 
script visible to the executors at run-time are also what I'm thinking, but I 
see them as work-arounds rather than first-class citizens of the framework. I'd 
like to help build this support if it does not exist in Spark, or at least 
document how it's possible to make them work.

For this however I need some guidance on where to look, what to hack at to try 
to make it work :) ...

> Support for Groovy closures
> ---------------------------
>
>                 Key: SPARK-15582
>                 URL: https://issues.apache.org/jira/browse/SPARK-15582
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Java API
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0
>         Environment: 6 node Debian 8 based Spark cluster
>            Reporter: Catalin Alexandru Zamfir
>
> After fixing SPARK-13599 and running one of our jobs against this fix for 
> Groovy dependencies (which indeed it fixed), we see the Spark executors stuck 
> at a ClassNotFound exception when running as a Script (via 
> GroovyShell.evalute (scriptText)). It seems Spark cannot de-serialize the 
> closure, or the closure is not received by the executor.
> {noformat}
> sparkContext.binaryFiles (ourPath).flatMap ({ onePathEntry -> code-block } as 
> FlatMapFunction).count ();
> { onePathEntry -> code-block } denotes a Groovy closure.
> {noformat}
> There is a groovy-spark example @ 
> https://github.com/bunions1/groovy-spark-example ... However the above uses a 
> modified Groovy. If my understanding is correct, Groovy compiles to native 
> byte-code, which should be easy for Spark to pick-up and use closures.
> The above example code fails with this stack-trace:
> {noformat}
> Caused by: java.lang.ClassNotFoundException: Script1$_run_closure1
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:348)
>       at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
>       at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>       at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>       at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214
> {noformat}
> Any ideas on how to tackle this, welcomed. I've tried Googling around for 
> similar issues, but nobody has found a solution.
> At least, point me on where to "hack" to make Spark support closures and I'd 
> share some of my time to make it work. There is SPARK-2171 arguing that 
> support for this is out of the box, but for projects of a relative complex 
> size where the driver application is contained/part-of a bigger application 
> and running on a cluster, things do not seem to work. I don't know if 
> SPARK-2171 has tried to run outside of a local[] cluster set-up where such 
> issues can arise.
> I saw a couple of people trying to make it to work, but again, they look to 
> work-arounds (eg. distribution of byte-code manually before needed, adding a 
> JAR with addJar and other work-arounds).
> Can this be done? Where can we look?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to