[
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972922#comment-16972922
]
Thomas Weise commented on BEAM-8591:
------------------------------------
Please see the following email thread and linked doc for options on how to run
Beam with Flink on k8s:
[https://lists.apache.org/thread.html/4e377933da8f5abb817413fcbd1de172b81a468c8a4d782255f46a1a@%3Cdev.beam.apache.org%3E]
The LOOPBACK environment won't work in a distributed environment, it is
designed for local execution.
I can see where the confusion may come from and the Flink runner page is going
to receive some updates soonish: BEAM-8243 CC: [~ibzib]
The DOCKER environment is also most likely not going to work, the doc explains
why (unless you are going to setup docker within k8s).
That leaves you with either the PROCESS environment or EXTERNAL (using the
Python SDK worker pool option). Take a look at the doc and ask for
clarification there or here:
[https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/]
> Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster.
> ----------------------------------------------------------------------------
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Mingliang Gong
> Priority: Major
>
> h2. Setup Clusters
> * Setup Local Flink Cluster:
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
> * Setup Kubernetes Flink Cluster using Minikube:
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=PortableRunner",
> "--job_endpoint=localhost:8099",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as pipeline:
> data = ["Sample data",
> "Sample data - 0",
> "Sample data - 1"]
> raw_data = (pipeline
> | 'CreateHardCodeData' >> beam.Create(data)
> | 'Map' >> beam.Map(lambda line : line + '.')
> | 'Print' >> beam.Map(print)){code}
> Verify different environment_type in Python SDK Harness Configuration
> *environment_type=LOOPBACK*
> # Run pipeline on local cluster: Works Fine
> # Run pipeline on K8S cluster, Exceptions are thrown:
> java.lang.Exception: The user defined 'open()' method caused an exception:
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException:
> UNAVAILABLE: io exception Caused by:
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> Connection refused: localhost/127.0.0.1:51017
> *environment_type=DOCKER*
> # Run pipeline on local cluster: Work fine
> # Run pipeline on K8S cluster, Exception are thrown:
> Caused by: java.io.IOException: Cannot run program "docker": error=2, No
> such file or directory.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)