[jira] [Updated] (FLINK-1789) Allow adding of URLs to the usercode class loader

Timo Walther (JIRA) Wed, 30 Sep 2015 08:11:33 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Timo Walther updated FLINK-1789:
--------------------------------
    Description: 
Currently, there is no option to add customs classpath URLs to the 
FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even if 
they are already present on all nodes.

It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
forwards them to BlobLibraryCacheManager.

The problem with the current approach is that the code loaded by the regular 
JVM class loader cannot refer to job specific types (which can be accessed only 
at the UserCodeClassLoader level). Unfortunately, this is the case if we use 
the classpath entry to generate the dataflows dynamically at runtime.
Currently this functionality needs to be done by "hacks" (hardcode a filesystem 
path next to the list of jars when initializing the BlobManager entry). It 
makes sense to open an issue which makes this list parameterizable via an 
additional ExecutionEnvironment argument (this is basically the only main 
feature which prohibits the use of Emma project with "off-the-shelf" Flink).
This, of course, would require that the folders are shared (e.g. via NFS) 
between client, master and workers. I think what made Stephan so excited is the 
idea of using the same URL mechanism in order to ship the code to all dependent 
parties (most probably by running a dedicated HTTP or FTP server on the client).

We make the following assumptions for the use case where we need the global 
class path:
- The URL is either a file path that points to a directory accessible to all 
nodes (NFS or so) and the client runs in the cluster as well.
- The URL is an HTTP URL or so that points to a file server that serves the 
classes to work in non-shared directory settings.

  was:
Currently, there is no option to add customs classpath URLs to the 
FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even if 
they are already present on all nodes.

It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
forwards them to BlobLibraryCacheManager.

The problem with the current approach is that the code loaded by the regular 
JVM class loader cannot refer to job specific types (which can be accessed only 
at the UserCodeClassLoader level). Unfortunately, this is the case if we use 
the classpath entry to generate the dataflows dynamically at runtime.
Currently this functionality needs to be done by "hacks" (hardcode a filesystem 
path next to the list of jars when initializing the BlobManager entry). It 
makes sense to open an issue which makes this list parameterizable via an 
additional ExecutionEnvironment argument (this is basically the only main 
feature which prohibits the use of Emma project with "off-the-shelf" Flink).
This, of course, would require that the folders are shared (e.g. via NFS) 
between client, master and workers. I think what made Stephan so excited is the 
idea of using the same URL mechanism in order to ship the code to all dependent 
parties (most probably by running a dedicated HTTP or FTP server on the client).


> Allow adding of URLs to the usercode class loader
> -------------------------------------------------
>
>                 Key: FLINK-1789
>                 URL: https://issues.apache.org/jira/browse/FLINK-1789
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>            Reporter: Timo Walther
>            Assignee: Timo Walther
>            Priority: Minor
>
> Currently, there is no option to add customs classpath URLs to the 
> FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
> if they are already present on all nodes.
> It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
> forwards them to BlobLibraryCacheManager.
> The problem with the current approach is that the code loaded by the regular 
> JVM class loader cannot refer to job specific types (which can be accessed 
> only at the UserCodeClassLoader level). Unfortunately, this is the case if we 
> use the classpath entry to generate the dataflows dynamically at runtime.
> Currently this functionality needs to be done by "hacks" (hardcode a 
> filesystem path next to the list of jars when initializing the BlobManager 
> entry). It makes sense to open an issue which makes this list parameterizable 
> via an additional ExecutionEnvironment argument (this is basically the only 
> main feature which prohibits the use of Emma project with "off-the-shelf" 
> Flink).
> This, of course, would require that the folders are shared (e.g. via NFS) 
> between client, master and workers. I think what made Stephan so excited is 
> the idea of using the same URL mechanism in order to ship the code to all 
> dependent parties (most probably by running a dedicated HTTP or FTP server on 
> the client).
> We make the following assumptions for the use case where we need the global 
> class path:
> - The URL is either a file path that points to a directory accessible to all 
> nodes (NFS or so) and the client runs in the cluster as well.
> - The URL is an HTTP URL or so that points to a file server that serves the 
> classes to work in non-shared directory settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1789) Allow adding of URLs to the usercode class loader

Reply via email to