[
https://issues.apache.org/jira/browse/BEAM-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miles Edwards updated BEAM-8189:
--------------------------------
Description:
h1. The Setup:
I have two Projects on the Google Cloud Platform
1) Service Project for my Dataflow jobs
2) Host Project for Shared VPC & Subnetworks
The Host Project has configured Firewall Rules for the Dataflow job. ie. allow
all traffic, allow all internal traffic, allow all traffic tagged with
'dataflow' etc
h1. The Args
{code:java}
--project <host project name>
--network <shared vpc project name>
--subnetwork "https://www.googleapis.com/compute/v1/projects/<shared vpc
project name>/regions/<region job is running in service
project>/subnetworks/<name of subnetwork in shared vpc project>"
--service_account_email=<service account with Compute Network User permission
for both projects, shared vpc network & subnetwork>
{code}
h1. The Problem
The job will hang on shuffles when set to run within the service project, but
use the host project network. I will also see the following warning:
{code:java}
The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for
internal connection with other VMs. Only rules with a target tag 'dataflow' or
empty target tags set apply. If you don't specify such a rule, any pipeline
with more than one worker that shuffles data will hang. Causes: No firewall
rules associated with your network.
{code}
h1. What I've Tried
As mentioned in my
[StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]]
, I've tried the following:
1. Only passing "subnetwork" arg without "network" but that only modifies the
warning to state "default" instead of "miles-qa-vpc", which sounds like a
logging error to me.
2. Firewall rules have been configured to:
- allow all traffic
- allow all internal traffic
- allow all traffic with the source tag 'dataflow'
- allow all traffic with the target tag 'dataflow'
3. Service Account has been configured to have Compute Network User permissions
in both projects.
4. Ensured subnetwork is in the same region as the job.
5. Network in the service project is happily serving a dedicated cluster for
other purposes in the host project.
It genuinely seems like the spawned Compute Instances are not gaining the
configuration.
I expect the Dataflow job not to report the firewall issue and successfully
deal with shuffling (GroupBys etc.)
was:
h1. The Setup:
I have two Projects on the Google Cloud Platform
1) Service Project for my Dataflow jobs
2) Host Project for Shared VPC & Subnetworks
The Host Project has configured Firewall Rules for the Dataflow job. ie. allow
all traffic, allow all internal traffic, allow all traffic tagged with
'dataflow' etc
h1. The Problem
The job will hang on shuffles when set to run within the service project, but
use the host project network. I will also see the following warning:
{code:java}
The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for
internal connection with other VMs. Only rules with a target tag 'dataflow' or
empty target tags set apply. If you don't specify such a rule, any pipeline
with more than one worker that shuffles data will hang. Causes: No firewall
rules associated with your network.
{code}
h1. What I've Tried
As mentioned in my
[StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]]
, I've tried the following:
1. Only passing "subnetwork" arg without "network" but that only modifies the
warning to state "default" instead of "miles-qa-vpc", which sounds like a
logging error to me.
2. Firewall rules have been configured to:
- allow all traffic
- allow all internal traffic
- allow all traffic with the source tag 'dataflow'
- allow all traffic with the target tag 'dataflow'
3. Service Account has been configured to have Compute Network User permissions
in both projects.
4. Ensured subnetwork is in the same region as the job.
5. Network in the service project is happily serving a dedicated cluster for
other purposes in the host project.
It genuinely seems like the spawned Compute Instances are not gaining the
configuration.
I expect the Dataflow job not to report the firewall issue and successfully
deal with shuffling (GroupBys etc.)
> DataflowRunner does not work with Shared VPC in another Project
> ---------------------------------------------------------------
>
> Key: BEAM-8189
> URL: https://issues.apache.org/jira/browse/BEAM-8189
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.15.0
> Reporter: Miles Edwards
> Priority: Major
>
> h1. The Setup:
> I have two Projects on the Google Cloud Platform
> 1) Service Project for my Dataflow jobs
> 2) Host Project for Shared VPC & Subnetworks
> The Host Project has configured Firewall Rules for the Dataflow job. ie.
> allow all traffic, allow all internal traffic, allow all traffic tagged with
> 'dataflow' etc
>
> h1. The Args
> {code:java}
> --project <host project name>
> --network <shared vpc project name>
> --subnetwork "https://www.googleapis.com/compute/v1/projects/<shared vpc
> project name>/regions/<region job is running in service
> project>/subnetworks/<name of subnetwork in shared vpc project>"
> --service_account_email=<service account with Compute Network User permission
> for both projects, shared vpc network & subnetwork>
> {code}
> h1. The Problem
> The job will hang on shuffles when set to run within the service project, but
> use the host project network. I will also see the following warning:
> {code:java}
> The network miles-qa-vpc doesn't have rules that open TCP ports 1-65535 for
> internal connection with other VMs. Only rules with a target tag 'dataflow'
> or empty target tags set apply. If you don't specify such a rule, any
> pipeline with more than one worker that shuffles data will hang. Causes: No
> firewall rules associated with your network.
> {code}
>
> h1. What I've Tried
> As mentioned in my
> [StackOverflow|[https://stackoverflow.com/questions/57868089/google-dataflow-warnings-when-using-service-host-projects-shared-vpcs-firew]]
> , I've tried the following:
> 1. Only passing "subnetwork" arg without "network" but that only modifies the
> warning to state "default" instead of "miles-qa-vpc", which sounds like a
> logging error to me.
> 2. Firewall rules have been configured to:
> - allow all traffic
> - allow all internal traffic
> - allow all traffic with the source tag 'dataflow'
> - allow all traffic with the target tag 'dataflow'
> 3. Service Account has been configured to have Compute Network User
> permissions in both projects.
> 4. Ensured subnetwork is in the same region as the job.
> 5. Network in the service project is happily serving a dedicated cluster for
> other purposes in the host project.
> It genuinely seems like the spawned Compute Instances are not gaining the
> configuration.
> I expect the Dataflow job not to report the firewall issue and successfully
> deal with shuffling (GroupBys etc.)
--
This message was sent by Atlassian Jira
(v8.3.2#803003)