[
https://issues.apache.org/jira/browse/SPARK-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377414#comment-14377414
]
Yan commented on SPARK-3306:
----------------------------
The "external resource" primarily will serve the purpose of reuse of such a
resource by different tasks on the same executor, such as a DB connection, to
minimize the latency of reconnection per task. It will differ from the existing
static "resources" like jar files, or other files in that the handles or
identifiers have to be kept in memory and the executor process has to provide
the access mechanism to its tasks. The current "static resources" have no
problem because they use disk locations to identify themselves and the tasks
have no difficulty to access them from disk.
All of these is of dynamic nature and much more complex than jars/files, so the
executors, I feel, should need to be modified/enhanced.
I have not found much time on this as promised due to other Spark SQL work.
Hopefully can give more concrete details for discussion soon.
> Addition of external resource dependency in executors
> -----------------------------------------------------
>
> Key: SPARK-3306
> URL: https://issues.apache.org/jira/browse/SPARK-3306
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Yan
>
> Currently, Spark executors only support static and read-only external
> resources of side files and jar files. With emerging disparate data sources,
> there is a need to support more versatile external resources, such as
> connections to data sources, to facilitate efficient data accesses to the
> sources. For one, the JDBCRDD, with some modifications, could benefit from
> this feature by reusing established JDBC connections from the same Spark
> context before.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]