[
https://issues.apache.org/jira/browse/BEAM-6146?focusedWorklogId=258264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-258264
]
ASF GitHub Bot logged work on BEAM-6146:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Jun/19 01:49
Start Date: 12/Jun/19 01:49
Worklog Time Spent: 10m
Work Description: tvalentyn commented on pull request #7180: [BEAM-6146]
Run pre-commit wordcount in batch and streaming mode.
URL: https://github.com/apache/beam/pull/7180#discussion_r292720519
##########
File path: sdks/python/build.gradle
##########
@@ -268,32 +269,37 @@ task directRunnerIT(dependsOn: 'installGcpTest') {
//
// ./gradlew :beam-sdks-python:portableWordCount
-PjobEndpoint=localhost:8099
//
-task portableWordCount(dependsOn: 'installGcpTest') {
- doLast {
- // TODO: Figure out GCS credentials and use real GCS input and output.
- def options = [
- "--input=/etc/profile",
- "--output=/tmp/py-wordcount-direct",
- "--runner=PortableRunner",
- "--experiments=worker_threads=100",
- ]
- if (project.hasProperty("streaming"))
- options += ["--streaming"]
- else
+task portableWordCount {
+ dependsOn portableWordCountTask('portableWordCountExample',
project.hasProperty("streaming"), envdir)
+}
+
+def portableWordCountTask(name, streaming, envdir) {
+ tasks.create(name) {
+ dependsOn = ['installGcpTest']
+ mustRunAfter = [':beam-runners-flink_2.11-job-server-container:docker',
':beam-sdks-python-container:docker']
+ doLast {
+ // TODO: Figure out GCS credentials and use real GCS input and output.
+ def options = [
+ "--input=/etc/profile",
+ "--output=/tmp/py-wordcount-direct",
+ "--runner=PortableRunner",
+ "--experiments=worker_threads=100",
+ ]
+ if (streaming)
+ options += ["--streaming"]
+ else
// workaround for local file output in docker container
- options += ["--environment_cache_millis=10000"]
- if (project.hasProperty("jobEndpoint"))
- options += ["--job_endpoint=${project.property('jobEndpoint')}"]
- exec {
- executable 'sh'
- args '-c', ". ${envdir}/bin/activate && python -m
apache_beam.examples.wordcount ${options.join(' ')}"
- // TODO: Check that the output file is generated and runs.
+ options += ["--environment_cache_millis=10000"]
+ if (project.hasProperty("jobEndpoint"))
+ options += ["--job_endpoint=${project.property('jobEndpoint')}"]
+ exec {
+ executable 'sh'
+ args '-c', ". ${envdir}/bin/activate && python -m
apache_beam.examples.wordcount ${options.join(' ')}"
Review comment:
This pipeline does not have unbounded sources - does it actually make sense
to run it in streaming mode on portable runner? I think we would want to run
something like
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/streaming_wordcount.py
to exercise streaming usecase.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 258264)
Time Spent: 4h 10m (was: 4h)
> Portable Flink End to end precommit test
> ----------------------------------------
>
> Key: BEAM-6146
> URL: https://issues.apache.org/jira/browse/BEAM-6146
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Ankur Goenka
> Assignee: Ankur Goenka
> Priority: Major
> Fix For: 2.10.0
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> Create an end to end wordcount based pipeline test executed on precommit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)