[
https://issues.apache.org/jira/browse/BEAM-3370?focusedWorklogId=132416&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-132416
]
ASF GitHub Bot logged work on BEAM-3370:
----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Aug/18 16:09
Start Date: 08/Aug/18 16:09
Worklog Time Spent: 10m
Work Description: lgajowy opened a new pull request #6179: [BEAM-3370 &
BEAM-3359] Enable running IOIT on flink
URL: https://github.com/apache/beam/pull/6179
This PR fixes two issues mentioned above.
BEAM-3359: I'm not sure if we need to use TestFlinkRunner at all but on the
other hand I found it super easy to allow passing flinkMaster to TestPipeline
runner. Because it doesn't fail any tests, I find it valuable change. It is
very hard to debug "why flinkMaster is not set when I'm using TestFlinkRunner"
otherwise.
BEAM:3370: this change is inspired on the Dataflow runner implementation
although not an exact one. I'm not sure if tempLocation is the best place to
save the jar that is produced from directories with compiled classes. I also
had doubts about whether to delete it or not. Dataflow runner does not delete,
but stages it in "stagingLocation" on gs filesystem. Should I use an analogous
option in Flink for that and behave the same way?
@aljoscha @lukecwik could you take a look?
------------------------
Follow this checklist to help us incorporate your contribution quickly and
easily:
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
It will help us expedite review of your Pull Request if you tag someone
(e.g. `@username`) to look at it.
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
</br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| --- | --- | --- | ---
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 132416)
Time Spent: 10m
Remaining Estimate: 0h
> Add ability to stage directories with compiled classes to Flink
> ---------------------------------------------------------------
>
> Key: BEAM-3370
> URL: https://issues.apache.org/jira/browse/BEAM-3370
> Project: Beam
> Issue Type: Sub-task
> Components: runner-flink
> Reporter: Lukasz Gajowy
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently, when _filesToStage_ contain a path to directory with resources,
> flink runner throws a {{"java.io.FileNotFoundException: <path_to_the_dir> (Is
> a directory)"}}. A way to include directory resources would be helpful.
> This "blocker" occurs while trying to run IOITs on flink runner, which
> basically makes it impossible/very inconvenient to run. When the tests are
> run via "mvn verify" command, a "test-classes" *directory* gets detected by
> detectClasspathResourcesToStage() method which in turn causes the above error.
> One way to solve this issue is to package the directories to jars with unique
> names and update the paths accordingly before staging the files on flink.
> Something similar is already done in the Dataflow runner
> ([GcsStager|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/GcsStager.java#L74]),
> more specifically in
> [PackageUtil|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java#L280]
> class. We are able to run the tests on dataflow thanks to that.
> As I checked in a [small experiment of
> mine|https://github.com/lgajowy/beam/commits/spark-and-flink-run-tests],
> providing analogous change makes it possible to run the tests on a Flink
> cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)