[ 
https://issues.apache.org/jira/browse/BEAM-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Łukasz Gajowy updated BEAM-3370:
--------------------------------
    Description: 
Currently, when _filesToStage_ contain a path to directory with resources, 
flink runner throws a {{"java.io.FileNotFoundException: <path_to_the_dir> (Is a 
directory)"}}. A way to include directory resources would be helpful.

This "blocker" occurs while trying to run IOITs on flink runner, which 
basically makes it impossible/very inconvenient to run. When the tests are run 
via "mvn verify" command, a "test-classes" *directory* gets detected by 
detectClasspathResourcesToStage() method which in turn causes the above error.

One way to solve this issue is to package the directories to jars with unique 
names and update the paths accordingly before staging the files on flink. 
Something similar is already done in the Dataflow runner 
([GcsStager|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/GcsStager.java#L74]),
 more specifically in 
[PackageUtil|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java#L280]
 class. We are able to run the tests on dataflow thanks to that.

As I checked in a [small experiment of 
mine|https://github.com/lgajowy/beam/commits/spark-and-flink-run-tests], 
providing analogous change makes it possible to run the tests on a Flink 
cluster.

  was:
Currently, when _filesToStage_ contain a path to directory with resources, 
flink runner throws a {{"java.io.FileNotFoundException: <path_to_the_dir> (Is a 
directory)"}}. A way to include directory resources would be helpful. 

This "blocker" occurs while trying to run IOITs on flink runner, which 
basically makes it impossible/very inconvenient to run. When the tests are run 
via "mvn verify" command, a "test-classes" *directory* gets detected by 
detectClasspathResourcesToStage() method which in turn causes the above error. 

One way to solve this issue is to package the directories to jars with unique 
names and update the paths accordingly before staging the files on flink. 
Something similar is already done in the Dataflow runner 
([GcsStager|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/GcsStager.java#L74]),
 more specifically in 
[PackageUtil|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java#L280]
 class. We are able to run the tests on dataflow thanks to that. 

As I checked in a [small experiment of 
mine|https://github.com/lgajowy/beam/commits/spark-and-flink-run-tests], 
providing analogous change makes it possible to run the tests on a Flink 
cluster. 



> Add ability to stage directories with compiled classes to Flink
> ---------------------------------------------------------------
>
>                 Key: BEAM-3370
>                 URL: https://issues.apache.org/jira/browse/BEAM-3370
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-flink
>            Reporter: Łukasz Gajowy
>            Priority: Minor
>
> Currently, when _filesToStage_ contain a path to directory with resources, 
> flink runner throws a {{"java.io.FileNotFoundException: <path_to_the_dir> (Is 
> a directory)"}}. A way to include directory resources would be helpful.
> This "blocker" occurs while trying to run IOITs on flink runner, which 
> basically makes it impossible/very inconvenient to run. When the tests are 
> run via "mvn verify" command, a "test-classes" *directory* gets detected by 
> detectClasspathResourcesToStage() method which in turn causes the above error.
> One way to solve this issue is to package the directories to jars with unique 
> names and update the paths accordingly before staging the files on flink. 
> Something similar is already done in the Dataflow runner 
> ([GcsStager|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/GcsStager.java#L74]),
>  more specifically in 
> [PackageUtil|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java#L280]
>  class. We are able to run the tests on dataflow thanks to that.
> As I checked in a [small experiment of 
> mine|https://github.com/lgajowy/beam/commits/spark-and-flink-run-tests], 
> providing analogous change makes it possible to run the tests on a Flink 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to