[jira] [Commented] (SPARK-3306) Addition of external resource dependency in executors

Yan (JIRA) Tue, 24 Mar 2015 09:12:13 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378073#comment-14378073
 ]


Yan commented on SPARK-3306:
----------------------------

If by "global singleton object", you meant it to be in the Executor class, 
it'll have to be supported by the Executor. Besides, one application may need 
to use multiple external resources.  If you meant it to be supplied by the 
application, my understanding is, correct me if I am wrong, that an application 
can now only submit tasks to an executor along with some "static" resources 
like jar files that can be shared between different tasks.

The need here is to have a hook so an app can specify the connection behavior, 
but executors are to use the hook, if any,  to 
initialize/cache/fetch-from-cache/terminate/show the "external resources".

In summary, there will be a pool. The question is whether an application, which 
is very task-oriented except for the "static external resource" usage like 
jars, can have the capabilities to manage the lifecycles of the cross-task 
external resources.

We have an initial implementation in 
https://github.com/Huawei-Spark/spark/tree/SPARK-3306. Please feel free to take 
a look and voice your advices. Note that this is not a complete implementation, 
but for experimental purpose it's working at least for JDBC connections.

> Addition of external resource dependency in executors
> -----------------------------------------------------
>
>                 Key: SPARK-3306
>                 URL: https://issues.apache.org/jira/browse/SPARK-3306
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Yan
>
> Currently, Spark executors only support static and read-only external 
> resources of side files and jar files. With emerging disparate data sources, 
> there is a need to support more versatile external resources, such as 
> connections to data sources, to facilitate efficient data accesses to the 
> sources. For one, the JDBCRDD, with some modifications,  could benefit from 
> this feature by reusing established JDBC connections from the same Spark 
> context before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3306) Addition of external resource dependency in executors

Reply via email to