[
https://issues.apache.org/jira/browse/BEAM-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miles Edwards closed BEAM-8189.
-------------------------------
Fix Version/s: 2.16.0
Resolution: Fixed
> Python DataflowRunner fails when using a Shared VPC from another project
> ------------------------------------------------------------------------
>
> Key: BEAM-8189
> URL: https://issues.apache.org/jira/browse/BEAM-8189
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.15.0
> Reporter: Miles Edwards
> Priority: Major
> Fix For: 2.16.0
>
>
> h1. The Setup:
> I have two Projects on the Google Cloud Platform
> 1) Service Project for my Dataflow jobs
> 2) Host Project for Shared VPC & Subnetworks
> The Host Project has configured Firewall Rules for the Dataflow job. ie.
> allow all traffic, allow all internal traffic, allow all traffic tagged with
> 'dataflow' etc
>
> h1. The Args
> {code:java}
> --project <host project name>
> --network <shared vpc project name>
> --subnetwork "https://www.googleapis.com/compute/v1/projects/<shared vpc
> project name>/regions/<region job is running in service
> project>/subnetworks/<name of subnetwork in shared vpc project>"
> --service_account_email=<service account with Compute Network User permission
> for both projects, shared vpc network & subnetwork>
> {code}
> h1. The Problem
> The job will hang when performing shuffle operations. I will also see the
> following warning:
> {code:java}
> The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for
> internal connection with other VMs. Only rules with a target tag 'dataflow'
> or empty target tags set apply. If you don't specify such a rule, any
> pipeline with more than one worker that shuffles data will hang. Causes: No
> firewall rules associated with your network.
> {code}
>
> h1. What I've Tried
> [StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]]
> 1. Only passing "subnetwork" arg without "network" but that only modifies the
> warning to state "default" instead of "miles-qa-vpc", which sounds like a
> logging error to me.
> 2. Firewall rules have been configured to:
> - allow all traffic
> - allow all internal traffic
> - allow all traffic with the source tag 'dataflow'
> - allow all traffic with the target tag 'dataflow'
> 3. Service Account has been configured to have Compute Network User
> permissions in both projects.
> 4. Ensured subnetwork is in the same region as the job.
> 5. Network in the service project is happily serving a dedicated cluster for
> other purposes in the host project.
> It genuinely seems like the spawned Compute Instances are not gaining the
> configuration.
> I expect the Dataflow job not to report the firewall issue and successfully
> deal with shuffling (GroupBys etc.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)