[ 
https://issues.apache.org/jira/browse/SPARK-13599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302921#comment-15302921
 ] 

Catalin Alexandru Zamfir edited comment on SPARK-13599 at 5/26/16 9:21 PM:
---------------------------------------------------------------------------

The environment is configured properly. The executors are configured with the 
spark.executors.extraClassPath directory where groovy 2.4.6 resides. We would 
not have had the first exception (conflicting modules) if 2.4.6 was not 
detected. The executors do have Groovy in their classpaths, that's 100% sure, 
because the executors already use some of our own specific JARs in the folder 
already set for them on the classpath.

The Groovy script is sent to the driver application (via Spring/RequestMapping 
HTTP POST body) and it starts executing the first stage of the job that doesn't 
rely on any Groovy closure/code (map/flatmaps/filters which are already in the 
Java code). On the second stage which is the above flatMap ({groovy closure}) 
it fails with the ClassNotFound exception.

I expected closures to be sent the same way Java 8 Lambdas are sent (serialized 
over the wire) so that executors can de-serialize it back. When run in local[] 
mode, no issues arise and everything works. When in standalone (cluster) mode, 
the issue arises.

Any idea on how the closures could translate to something the Spark executors 
understand? The alternatives I'm thinking of is to distribute the byte-code 
before the job starts (and load them in the nodes) or to write the byte-code to 
a JAR file and use Spark's "addJar" method. But these, if they work, are 
work-arounds.

Since the above issue is fixed, any idea you guys could share on how to make 
the executors accept a Groovy closure? Or at least find it in the class-path 
before it is needed?


was (Author: antauri):
Nope. The executors are configured with the spark.executors.extraClassPath 
directory where groovy 2.4.6 resides. We would not have had the first exception 
(conflicting modules) if 2.4.6 was not detected. The executors do have Groovy 
in their classpaths, that's 100% sure.

The Groovy script is sent to the driver application (via Spring/RequestMapping 
HTTP POST body) and it starts executing the first stage of the job that doesn't 
rely on any Groovy closure/code (map/flatmaps/filters which are already in the 
Java code). On the second stage which is the above flatMap ({groovy closure}) 
it fails with the ClassNotFound exception.

I expected closures to be sent the same way Java 8 Lambdas are sent (serialized 
over the wire) so that executors can de-serialize it back. When run in local[] 
mode, no issues arise and everything works. When in standalone (cluster) mode, 
the issue arises.

Any idea on how the closures could translate to something the Spark executors 
understand? The alternatives I'm thinking of is to distribute the byte-code 
before the job starts (and load them in the nodes) or to write the byte-code to 
a JAR file and use Spark's "addJar" method. But these, if they work, are 
work-arounds.

Since the above issue is fixed, any idea you guys could share on how to make 
the executors accept a Groovy closure? Or at least find it in the class-path 
before it is needed?

> Groovy-all ends up in spark-assembly if hive profile set
> --------------------------------------------------------
>
>                 Key: SPARK-13599
>                 URL: https://issues.apache.org/jira/browse/SPARK-13599
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>             Fix For: 1.6.2, 2.0.0
>
>
> If you do a build with {{-Phive,hive-thriftserver}} then the contents of 
> {{org.codehaus.groovy:groovy-all}} gets into the spark-assembly.jar
> This bad because
> * it makes the JAR bigger
> * it makes the build longer
> * it's an uber-JAR itself, so can include things (maybe even conflicting 
> things)
> * It's something else that needs to be kept up to date security-wise



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to