[
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976902#comment-15976902
]
Michael Schmeißer edited comment on SPARK-650 at 4/20/17 3:31 PM:
------------------------------------------------------------------
In a nutshell, we have our own class "MySerializer" which is derived from
{{org.apache.spark.serializer.JavaSerializer}} and performs our custom
initialization in `MySerializer#newInstance` before calling the super method
`com.gfk.st2.pace.df.jobflow.orch.spark.api.ClosureSerializerAsInitHook#newInstance`.
Then, when building the SparkConf for initialization of the SparkContext, we
add `pSparkConf.set("spark.closure.serializer",
MySerializer.class.getCanonicalName());`.
We package this with our application JAR and it works. So I think you have to
look at your classpath configuration [~mboes]. In our case, the JAR which
contains the closure serializer is listeed in the following properties:
* driver.extraClassPath
* executor.extraClassPath
* yarn.secondary.jars
* spark.yarn.secondary.jars
* spark.driver.extraClassPath
* spark.executor.extraClassPath
If I recall it correctly, the variants without the "spark." prefix are produced
by us because we prefix all of our properties with "spark." to transfer them
via Oozie and unmask them again later, so you should only need the properties
with the "spark." prefix.
Regarding the questions of [~riteshtijoriwala]: 1) Please see the related issue
SPARK-1107. 2) You can add a TaskCompletionListener with
`org.apache.spark.TaskContext#addTaskCompletionListener(org.apache.spark.util.TaskCompletionListener)`.
To get the current TaskContext on the executor, just use
`org.apache.spark.TaskContext#get`. We have some functionality to log the
progress of a function in fixed intervals (e.g. every 1,000 records). To do
this, you can use mapPartitions with a custom iterator.
was (Author: skamandros):
In a nutshell, we have our own class "MySerializer" which is derived from
`org.apache.spark.serializer.JavaSerializer` and performs our custom
initialization in `MySerializer#newInstance` before calling the super method
`com.gfk.st2.pace.df.jobflow.orch.spark.api.ClosureSerializerAsInitHook#newInstance`.
Then, when building the SparkConf for initialization of the SparkContext, we
add `pSparkConf.set("spark.closure.serializer",
MySerializer.class.getCanonicalName());`.
We package this with our application JAR and it works. So I think you have to
look at your classpath configuration [~mboes]. In our case, the JAR which
contains the closure serializer is listeed in the following properties:
* driver.extraClassPath
* executor.extraClassPath
* yarn.secondary.jars
* spark.yarn.secondary.jars
* spark.driver.extraClassPath
* spark.executor.extraClassPath
If I recall it correctly, the variants without the "spark." prefix are produced
by us because we prefix all of our properties with "spark." to transfer them
via Oozie and unmask them again later, so you should only need the properties
with the "spark." prefix.
Regarding the questions of [~riteshtijoriwala]: 1) Please see the related issue
SPARK-1107. 2) You can add a TaskCompletionListener with
`org.apache.spark.TaskContext#addTaskCompletionListener(org.apache.spark.util.TaskCompletionListener)`.
To get the current TaskContext on the executor, just use
`org.apache.spark.TaskContext#get`. We have some functionality to log the
progress of a function in fixed intervals (e.g. every 1,000 records). To do
this, you can use mapPartitions with a custom iterator.
> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Matei Zaharia
> Priority: Minor
>
> Would be useful to configure things like reporting libraries
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]