Hello Everyone,

https://github.com/apache/beam/pull/25686 is approved and merged.  I
describe below its two new Beam project assets for those who know
Kubernetes and terraform and those who don't.

*Short Version (For those who know Kubernetes and terraform)*:

This PR provides:

1. An end-to-end Infrastructure-as-Code solution using terraform to
provision a Google Kubernetes Engine (GKE) from scratch starting from a
custom network, service account, IAM roles all the way to the private
autopilot Kubernetes engine and a bastion host to connect.
See instructions:
https://github.com/apache/beam/tree/master/.test-infra/terraform/google-cloud-platform/google-kubernetes-engine

2. A strimzi.io Kafka on kubernetes.  I tailored it using kustomize
<https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/> to
version control the strimzi operator and allow for cloud specific
overlays.  The kustomize solution uses an internal GKE TCP load balancer
overlay.
See instructions:
https://github.com/apache/beam/tree/master/.test-infra/kafka/strimzi

*Long Version (For those not as familiar with Kubernetes and terraform)*:

What does this have to do with Beam?

This email relates to Beam by providing a resource to help with our Beam
I/O related work.  Beam I/Os are tools in the SDK that allow pipelines to
read from and write to various databases and API-dependent resources.  You
may have seen, for example, BigQueryIO, a class in the SDK that provides
transforms for reading from and writing to BigQuery.  In this email's
context, we have KafkaIO, a class that provides transforms for reading from
and writing to Kafka.  Kafka is an event streaming platform (See
https://kafka.apache.org/).  While we have our existing jenkins solution,
it was designed for integration testing and shuts down some important
resources after tests complete.  We needed a way to spin up a Kafka
resource on our own in our own environments.

What is Kubernetes?

Kubernetes is an open-source system for automating deployment, scaling and
management of containerized workloads (See kubernetes.io).  This PR chooses
Kubernetes as it serves as the scalable environment and one typically used
by enterprise users of Beam and KafkaIO.

What is Infrastructure-as-Code and terraform?

Infrastructure-as-Code (IaC) is a declarative-based process of managing
resources using code.  While also code, bash scripts or our existing groovy
scripts to provision resources sit on the side of imperative methods that
are arguably less readable and prone to error.  Terraform is one
established IaC solution to provision resources in the major cloud
providers.  This PR focuses on Google Cloud and particularly Google
Kubernetes Engine (GKE).  In order to provision GKE, the terraform solution
needs to provision the service account, network and other related resources
per security best practice.

What value does this PR provide?

This PR provides an end-to-end solution to provision a private GKE
autopilot and strimzi kafka cluster.  It allows a Beam developer to deploy
this in their own Google Cloud project.  Additionally there are
instructions on how to use the solution in your KafkaIO related pipeline.
They are listed below in recommended order:
1)
https://github.com/apache/beam/tree/master/.test-infra/terraform/google-cloud-platform/google-kubernetes-engine

2) https://github.com/apache/beam/tree/master/.test-infra/kafka/strimzi

Best,

Damon

On Wed, Mar 1, 2023 at 4:12 PM Damon Douglas <damondoug...@google.com>
wrote:

> Hello Everyone,
>
> I created a PR to provide to the Beam community terraform code to
> provision a private Google Kubernetes Engine and kubernetes manifests to
> provision an internally TCP load balanced strimzi.io Kafka cluster.  This
> solution helped me a lot when I needed a repeatable solution to spin up
> resources for reading from and writing to Kafka without having to scratch
> my head and remember steps.
>
> https://github.com/apache/beam/pull/25686
>
> This is *not* meant to replace our current test-infra kubernetes and
> kafka setup which is designed for our automated testing using jenkins.
>
> Best,
>
> Damon
>
>

Reply via email to