I think there is not yet any mechanism, but it would be a good addition, I
agree.

Between JobManager and TaskManagers, the JARs are cached. The TaskManagers
receive hashes of the JARs only, and only load them if they do not already
have them. The same mechanism should be used for the Client to upload JARs
to the JobManager - that way, they would be transferred only once.

For now, a workaround is to directly put the user JARs into the "lib"
directory of the flink directory. That way they are available to every
worker without and uploads per job. Your RemoteExecutionEnvironment would
then not have any JARs at all.

Would the workaround work for you for now?

Greetings,
Stephan


On Thu, Sep 24, 2015 at 1:31 PM, Hanan Meyer <ha...@scalabill.it> wrote:

> Hello All
>
> I use Flink in order to filter data from Hdfs and write it back as CSV.
>
> I keep getting the "Checking and uploading JAR files" on every DataSet
> filtering action or
> executionEnvironment execution.
>
> I use ExecutionEnvironment.createRemoteEnvironment(ip+jars..) because I
> launch Flink from
> a J2EE Aplication Server .
>
> The Jars serialization and transportation takes a huge part of the
> execution time .
> Is there a way to force Flink to pass the Jars only once?
>
> Please advise
>
> Thanks,
>
> Hanan Meyer
>

Reply via email to