Ross Brigoli created SPARK-27048:
------------------------------------

             Summary: A way to execute functions on Executor Startup and 
Executor Exit in Standalone
                 Key: SPARK-27048
                 URL: https://issues.apache.org/jira/browse/SPARK-27048
             Project: Spark
          Issue Type: Wish
          Components: Deploy, Spark Core, Spark Submit
    Affects Versions: 2.3.3, 2.3.1
            Reporter: Ross Brigoli


*Background*

We have a Spark Standalone ETL workload that is heavily dependent on Apache 
Ignite KV store for lookup/reference data. There are hundreds (400+) of lookup 
data some are up to 300K records. We formerly used broadcast variables but 
later found out that it was not fast enough.

So we decided implement a caching mechanism by retrieving reference data from 
JDBC source and put them in-memory through Apache ignite as replicated cache. 
Each Spark worker node is also running an Ignite node (JVM). Then we let the 
spark executors retrieve the data from Ignite through "shared memory port". 
This is very fast but is causing instability in the Ignite cluster. The reason 
is that when the Spark executor JVM terminates, the Ignite Data Grid is 
terminated abnormally. This makes the Ignite cluster wait for the client node 
(which is the spark executor) to reconnect making the Ignite cluster 
non-responsive for a while.

*Wish*

We have this need for an ability to close the ignite client node gracefully 
just before the Executor process ends. So a feature that makes it possible to 
pass an EventHandler for "executor.onStart" and "executor.exitExecutor()" would 
be really really useful.

It could be a spark-submit argument or an entry in the spark-defaults.conf that 
looks something like:

{{spark.executor.startUpClass=com.company.ExecutorInitializer}}{{spark.executor.shutdownClass=com.company.ExecutorCleaner}}

The class will have to implement an interface provided by Spark. This class can 
then be loaded dynamically in the CoarseGrainedExecutorBackend and called on 
the onStart() and exitExecutor() methods respectively

This is also useful for opening and closing JDBC connections per executor 
instead of per partition.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to