[
https://issues.apache.org/jira/browse/BEAM-3370?focusedWorklogId=132889&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-132889
]
ASF GitHub Bot logged work on BEAM-3370:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Aug/18 09:50
Start Date: 09/Aug/18 09:50
Worklog Time Spent: 10m
Work Description: lgajowy commented on issue #6179: [BEAM-3370 &
BEAM-3359] Enable running IOIT on flink
URL: https://github.com/apache/beam/pull/6179#issuecomment-411702729
@aljoscha Running the IOIT on Flink is already possible in this PR provided
that you have a standalone cluster somewhere. You can run them using the
following command (example):
```
./gradlew integrationTest -p sdks/java/io/file-based-io-tests/
-DintegrationTestPipelineOptions='["--numberOfRecords=1000",
"--filenamePrefix=PREFIX", "--runner=TestFlinkRunner",
"--flinkMaster=127.0.0.1:8081", "--tempLocation=/tmp/"]'
-DintegrationTestRunner=flink --tests org.apache.beam.sdk.io.text.TextIOIT
--info
```
I was able to run all tests smoothly except HadoopInputFormatIOIT,
ParquetIOIT and TFRecordIOIT. Flink ended with success status but JUnit tests
returned error ("Job submission failed."). IMO this is some other issue to be
resolved (maybe you know what could be the case?). I'll probably dig into this
later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 132889)
Time Spent: 0.5h (was: 20m)
> Add ability to stage directories with compiled classes to Flink
> ---------------------------------------------------------------
>
> Key: BEAM-3370
> URL: https://issues.apache.org/jira/browse/BEAM-3370
> Project: Beam
> Issue Type: Sub-task
> Components: runner-flink
> Reporter: Lukasz Gajowy
> Priority: Minor
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently, when _filesToStage_ contain a path to directory with resources,
> flink runner throws a {{"java.io.FileNotFoundException: <path_to_the_dir> (Is
> a directory)"}}. A way to include directory resources would be helpful.
> This "blocker" occurs while trying to run IOITs on flink runner, which
> basically makes it impossible/very inconvenient to run. When the tests are
> run via "mvn verify" command, a "test-classes" *directory* gets detected by
> detectClasspathResourcesToStage() method which in turn causes the above error.
> One way to solve this issue is to package the directories to jars with unique
> names and update the paths accordingly before staging the files on flink.
> Something similar is already done in the Dataflow runner
> ([GcsStager|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/GcsStager.java#L74]),
> more specifically in
> [PackageUtil|https://github.com/apache/beam/blob/cd186a531aaff0b21cf009b034e1a41f7e7b64af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java#L280]
> class. We are able to run the tests on dataflow thanks to that.
> As I checked in a [small experiment of
> mine|https://github.com/lgajowy/beam/commits/spark-and-flink-run-tests],
> providing analogous change makes it possible to run the tests on a Flink
> cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)