This is an automated email from the ASF dual-hosted git repository.
suvasude pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-gobblin.git
The following commit(s) were added to refs/heads/master by this push:
new 570f3d7 [GOBBLIN-932] Create deployment for Azure, clean up existing
deployments
570f3d7 is described below
commit 570f3d7129e708fd9583ef47676333cc486a97a5
Author: William Lo <[email protected]>
AuthorDate: Fri Dec 6 15:47:57 2019 -0800
[GOBBLIN-932] Create deployment for Azure, clean up existing deployments
Closes #2799 from Will-Lo/azure-deploy
---
.../user-guide/Azure-Kubernetes-Deployment.md | 83 ++++++++++++++++++++++
.../user-guide/Building-Gobblin-as-a-Service.md | 22 +++++-
.../gobblin-service/azure-cluster/ingress.yaml | 13 ++++
.../azure-cluster/kustomization.yaml | 4 ++
.../{application.yaml => deployment.yaml} | 24 ++-----
.../flowconfig-templates/distcp.template | 51 +++++++++++++
.../gobblin-service/base-cluster/ingress.yaml | 4 +-
.../base-cluster/kustomization.yaml | 11 +++
.../gobblin-service/base-cluster/service.yaml | 14 ++++
.../gobblin-service/base-cluster/storage.yaml | 28 --------
.../{application.yaml => deployment.yaml} | 22 ++----
.../mysql-cluster/kustomization.yaml | 5 +-
.../mysql-cluster/mysql-deployment.yaml | 2 +-
.../gobblin-service/mysql-cluster/mysql-pv.yaml | 2 -
14 files changed, 214 insertions(+), 71 deletions(-)
diff --git a/gobblin-docs/user-guide/Azure-Kubernetes-Deployment.md
b/gobblin-docs/user-guide/Azure-Kubernetes-Deployment.md
new file mode 100644
index 0000000..61b789c
--- /dev/null
+++ b/gobblin-docs/user-guide/Azure-Kubernetes-Deployment.md
@@ -0,0 +1,83 @@
+# GaaS on Azure Deployment Steps
+
+## Create Azure Container Registry [Optional]
+
+1\) Log into Azure Container Registry
+
+```bash
+$ az acr login --name gobblintest
+```
+
+2\) Tag docker images to container registry
+
+```bash
+$ docker tag <gaas_image_id> gobblintest.azurecr.io/gobblin-service
+$ docker tag <standalone_image_id> gobblintest.azurecr.io/gobblin-standalone
+```
+
+3\) Push the images
+
+```bash
+$ docker push gobblintest.azurecr.io/gobblin-service
+$ docker push gobblintest.azurecr.io/gobblin-standalone
+```
+
+The images should now be hosted on azure with the tag:latest
+
+## Deploy the base K8s cluster
+
+1\) Create a resource group on Azure
+
+2\) Create a cluster and deploy it onto the resource group
+
+```bash
+az aks create --resource-group <resource_group_name> --name GaaS-cluster-test
--node-count 1 --enable-addons monitoring --generate-ssh-keys
+```
+
+3\) Switch kubectl to use azure
+
+4\) Check status of cluster
+
+```bash
+$ kubectl get pods
+```
+
+## Install the nginx ingress to connect to the Azure Cluster
+
+1\) Install helm if you don't currently have it
+
+```bash
+brew install helm
+helm init
+```
+
+2\) Deploy the nginx helm chart to create the ingress
+
+```bash
+helm install stable/nginx-ingress
+```
+
+If this is the first time deploying helm (v2.0), you will need to set up the
tiller, which is a helm serviceaccount with sudo permissions that lives inside
of the cluster. Otherwise you'll run into this
[issue](https://github.com/helm/helm/issues/2224).
+
+> Error: configmaps is forbidden: User
"system:serviceaccount:kube-system:default" cannot list configmaps in the
namespace "kube-system"
+
+To set up the tiller \(steps are also found in the issue link\)
+
+```bash
+kubectl create serviceaccount --namespace kube-system tiller
+kubectl create clusterrolebinding tiller-cluster-rule
--clusterrole=cluster-admin --serviceaccount=kube-system:tiller
+kubectl edit deploy --namespace kube-system tiller-deploy #and add the line
serviceAccount: tiller to spec/template/spec
+```
+
+3\) Deploy the ingress controller in
`gobblin-kubernetes/gobblin-service/azure-cluster`
+
+4\) Run `kubectl get services`, and the output should look something like this:
+
+```text
+gaas-svc ClusterIP 10.0.176.58
<none> 6956/TCP 16h
+honorary-possum-nginx-ingress-controller LoadBalancer 10.0.182.255
<EXTERNAL_IP> 80:30488/TCP,443:31835/TCP 6m13s
+honorary-possum-nginx-ingress-default-backend ClusterIP 10.0.236.153
<none> 80/TCP 6m13s
+kubernetes ClusterIP 10.0.0.1
<none> 443/TCP 10d
+```
+
+5\) Send a request to the IP for the `honorary-possum-nginx-ingress-controller`
diff --git a/gobblin-docs/user-guide/Building-Gobblin-as-a-Service.md
b/gobblin-docs/user-guide/Building-Gobblin-as-a-Service.md
index 8cb6fcf..5661b18 100644
--- a/gobblin-docs/user-guide/Building-Gobblin-as-a-Service.md
+++ b/gobblin-docs/user-guide/Building-Gobblin-as-a-Service.md
@@ -31,4 +31,24 @@ To run the full docker compose:
4. `docker compose -f
gobblin-docker/gobblin-service/alpine-gaas-latest/docker-compose.yml build`
5. `docker compose -f
gobblin-docker/gobblin-service/alpine-gaas-latest/docker-compose.yml up`
-The docker container exposes the endpoints from Gobblin as a Service which can
be accessed on `localhost:6956`
\ No newline at end of file
+The docker container exposes the endpoints from Gobblin as a Service which can
be accessed on `localhost:6956`
+
+# Running Gobblin as a Service with Kubernetes
+Gobblin as a service also has a kubernetes cluster, which can be deployed to
any K8s environment.
+
+Currently, the yamls use
[Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/)
for configuration management. In the future, we may utilise Helm instead.
+
+To cluster is split into 3 environments
+1) base-cluster (deploys one pod of GaaS and Gobblin standalone, where GaaS
writes jobSpecs to a folder tracked by the standalone instance)
+2) mysql-cluster (utilises MySQL for storing specStores instead of FS, future
work may involve writing to a job queue to be picked by gobblin standalone)
+3) azure-cluster (deploys Dev on Microsoft Azure), more docs
[here](./Azure-Kubernetes-Deployment.md)
+
+To add any flow config template for GaaS to use, add the `.template` file to
`gobblin-kubernetes/gobblin-service/base-cluster/` and add the file to the
configmap.
+For production purposes, flow config templates should be stored in a proper
file system or a database instead of being added to the configmap.
+
+To deploy any of these clusters, run the following command from the repository
root.
+```
+kubectl apply -k gobblin-kubernetes/gobblin-service/<ENV>/
+```
+
+There, find the external IP of the cluster and start sending requests.
diff --git a/gobblin-kubernetes/gobblin-service/azure-cluster/ingress.yaml
b/gobblin-kubernetes/gobblin-service/azure-cluster/ingress.yaml
new file mode 100644
index 0000000..a419a5a
--- /dev/null
+++ b/gobblin-kubernetes/gobblin-service/azure-cluster/ingress.yaml
@@ -0,0 +1,13 @@
+apiVersion: extensions/v1beta1
+kind: Ingress
+metadata:
+ name: gaas-ingress
+ annotations:
+ # utilize an nginx ingress as default, to set up read file at
incubator-gobblin/gobblin-docs/user-guide/Azure-Kubernetes-Deployment.md
+ kubernetes.io/ingress.class: nginx
+ nginx.ingress.kubernetes.io/ssl-redirect: "false"
+ nginx.ingress.kubernetes.io/rewrite-target: /$1
+spec:
+ backend:
+ serviceName: gaas-svc
+ servicePort: 6956
diff --git
a/gobblin-kubernetes/gobblin-service/azure-cluster/kustomization.yaml
b/gobblin-kubernetes/gobblin-service/azure-cluster/kustomization.yaml
new file mode 100644
index 0000000..dd4abf1
--- /dev/null
+++ b/gobblin-kubernetes/gobblin-service/azure-cluster/kustomization.yaml
@@ -0,0 +1,4 @@
+bases:
+ - ../mysql-cluster
+patchesStrategicMerge:
+ - ingress.yaml
diff --git a/gobblin-kubernetes/gobblin-service/base-cluster/application.yaml
b/gobblin-kubernetes/gobblin-service/base-cluster/deployment.yaml
similarity index 79%
rename from gobblin-kubernetes/gobblin-service/base-cluster/application.yaml
rename to gobblin-kubernetes/gobblin-service/base-cluster/deployment.yaml
index c50a4b7..57bec9d 100644
--- a/gobblin-kubernetes/gobblin-service/base-cluster/application.yaml
+++ b/gobblin-kubernetes/gobblin-service/base-cluster/deployment.yaml
@@ -22,18 +22,17 @@ spec:
- name: 'shared-jobs'
persistentVolumeClaim:
claimName: shared-jobs-claim
- - name: 'shared-template-catalogs'
- persistentVolumeClaim:
- claimName: shared-template-catalogs-claim
+ - name: flowconfig-templates
+ configMap:
+ name: flowconfig-templates
containers:
- name: gobblin-service
image: will97/gobblin-as-a-service:latest
volumeMounts:
- name: shared-jobs
mountPath: /tmp/gobblin-as-service/jobs
- - name: shared-template-catalogs
+ - name: flowconfig-templates
mountPath: /tmp/templateCatalog
-
---
apiVersion: apps/v1
kind: Deployment
@@ -62,18 +61,3 @@ spec:
volumeMounts:
- name: shared-jobs
mountPath: /tmp/gobblin-standalone/jobs
----
-apiVersion: v1
-kind: Service
-metadata:
- name: gaas-svc
- labels:
- app: gobblin-service
-spec:
- type: ClusterIP
- ports:
- - protocol: TCP
- port: 6956
- targetPort: 6956
- selector:
- app: gaas
diff --git
a/gobblin-kubernetes/gobblin-service/base-cluster/flowconfig-templates/distcp.template
b/gobblin-kubernetes/gobblin-service/base-cluster/flowconfig-templates/distcp.template
new file mode 100644
index 0000000..1626abb
--- /dev/null
+++
b/gobblin-kubernetes/gobblin-service/base-cluster/flowconfig-templates/distcp.template
@@ -0,0 +1,51 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# ====================================================================
+# Job configurations
+# ====================================================================
+
+gobblin.template.required_attributes="from,to"
+
+job.name=Distcp
+job.description="Distributed copy"
+
+# target location for copy
+data.publisher.final.dir=${gobblin.flow.output.dataset.descriptor.path}
+gobblin.dataset.pattern=${gobblin.flow.input.dataset.descriptor.path}
+
+gobblin.dataset.profile.class=org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder
+
+# ====================================================================
+# Distcp configurations
+# ====================================================================
+
+extract.namespace=org.apache.gobblin.copy
+data.publisher.type=org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher
+source.class=org.apache.gobblin.data.management.copy.CopySource
+writer.builder.class=org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
+converter.classes=org.apache.gobblin.converter.IdentityConverter
+
+task.maxretries=0
+workunit.retry.enabled=false
+
+distcp.persist.dir=/tmp/distcp-persist-dir
+
+cleanup.staging.data.per.task=false
+gobblin.trash.skip.trash=true
+state.store.enabled=false
+job.commit.parallelize=true
diff --git a/gobblin-kubernetes/gobblin-service/base-cluster/ingress.yaml
b/gobblin-kubernetes/gobblin-service/base-cluster/ingress.yaml
index 7c8f99c..c50c50b 100644
--- a/gobblin-kubernetes/gobblin-service/base-cluster/ingress.yaml
+++ b/gobblin-kubernetes/gobblin-service/base-cluster/ingress.yaml
@@ -1,8 +1,8 @@
-apiVersion: networking.k8s.io/v1beta1
+apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gaas-ingress
spec:
backend:
serviceName: gaas-svc
- servicePort: 6956
\ No newline at end of file
+ servicePort: 6956
diff --git a/gobblin-kubernetes/gobblin-service/base-cluster/kustomization.yaml
b/gobblin-kubernetes/gobblin-service/base-cluster/kustomization.yaml
new file mode 100644
index 0000000..eeb1f1e
--- /dev/null
+++ b/gobblin-kubernetes/gobblin-service/base-cluster/kustomization.yaml
@@ -0,0 +1,11 @@
+resources:
+ - deployment.yaml
+ - storage.yaml
+ - service.yaml
+ - ingress.yaml
+configMapGenerator:
+ # only used for development purposes to allow an easy way to expose template
files to GaaS
+ # add flow templates here
+ - name: flowconfig-templates
+ files:
+ - flowconfig-templates/distcp.template
diff --git a/gobblin-kubernetes/gobblin-service/base-cluster/service.yaml
b/gobblin-kubernetes/gobblin-service/base-cluster/service.yaml
new file mode 100644
index 0000000..9d163f1
--- /dev/null
+++ b/gobblin-kubernetes/gobblin-service/base-cluster/service.yaml
@@ -0,0 +1,14 @@
+apiVersion: v1
+kind: Service
+metadata:
+ name: gaas-svc
+ labels:
+ app: gobblin-service
+spec:
+ type: ClusterIP
+ ports:
+ - protocol: TCP
+ port: 6956
+ targetPort: 6956
+ selector:
+ app: gaas
diff --git a/gobblin-kubernetes/gobblin-service/base-cluster/storage.yaml
b/gobblin-kubernetes/gobblin-service/base-cluster/storage.yaml
index 0765f98..3e98769 100644
--- a/gobblin-kubernetes/gobblin-service/base-cluster/storage.yaml
+++ b/gobblin-kubernetes/gobblin-service/base-cluster/storage.yaml
@@ -24,31 +24,3 @@ spec:
resources:
requests:
storage: 100Mi
----
-apiVersion: v1
-kind: PersistentVolume
-metadata:
- name: shared-template-catalogs-volume
-spec:
- capacity:
- storage: 50Mi
- volumeMode: Filesystem
- accessModes:
- - ReadWriteOnce
- persistentVolumeReclaimPolicy: Delete
- storageClassName: manual
- hostPath:
- path: "/tmp/templateCatalog"
----
-kind: PersistentVolumeClaim
-apiVersion: v1
-metadata:
- name: shared-template-catalogs-claim
-spec:
- accessModes:
- - ReadWriteOnce
- storageClassName: manual
- resources:
- requests:
- storage: 50Mi
-
diff --git a/gobblin-kubernetes/gobblin-service/mysql-cluster/application.yaml
b/gobblin-kubernetes/gobblin-service/mysql-cluster/deployment.yaml
similarity index 89%
rename from gobblin-kubernetes/gobblin-service/mysql-cluster/application.yaml
rename to gobblin-kubernetes/gobblin-service/mysql-cluster/deployment.yaml
index 20a3226..c71a4ad 100644
--- a/gobblin-kubernetes/gobblin-service/mysql-cluster/application.yaml
+++ b/gobblin-kubernetes/gobblin-service/mysql-cluster/deployment.yaml
@@ -22,6 +22,9 @@ spec:
- name: shared-jobs
persistentVolumeClaim:
claimName: shared-jobs-claim
+ - name: flowconfig-templates
+ configMap:
+ name: flowconfig-templates
- name: gaas-config
configMap:
name: gaas-config
@@ -44,6 +47,8 @@ spec:
volumeMounts:
- name: shared-jobs
mountPath: /tmp/gobblin-as-service/jobs
+ - name: flowconfig-templates
+ mountPath: /tmp/templateCatalog
- name: gaas-config
mountPath: /home/gobblin/conf/gobblin-as-service/application.conf
subPath: gaas-application.conf
@@ -51,7 +56,7 @@ spec:
initContainers:
- name: init-mysql
image: busybox:1.28
- command: ["sh", "-c", "until nslookup mysql; do echo waiting for
mysql; sleep 2; done;"]
+ command: ['sh', '-c', 'until nslookup mysql; do echo waiting for
mysql; sleep 2; done;']
---
@@ -88,18 +93,3 @@ spec:
- name: standalone-config
mountPath: /home/gobblin/conf/standalone/application.conf
subPath: standalone-application.conf
----
-apiVersion: v1
-kind: Service
-metadata:
- name: gaas-svc
- labels:
- app: gobblin-service
-spec:
- type: NodePort
- ports:
- - port: 6956
- protocol: TCP
- targetPort: 6956
- selector:
- app: gaas
diff --git
a/gobblin-kubernetes/gobblin-service/mysql-cluster/kustomization.yaml
b/gobblin-kubernetes/gobblin-service/mysql-cluster/kustomization.yaml
index 9899123..cd3f446 100644
--- a/gobblin-kubernetes/gobblin-service/mysql-cluster/kustomization.yaml
+++ b/gobblin-kubernetes/gobblin-service/mysql-cluster/kustomization.yaml
@@ -1,7 +1,10 @@
+bases:
+ - ../base-cluster
resources:
- - application.yaml
- mysql-deployment.yaml
- mysql-pv.yaml
+patchesStrategicMerge:
+ - deployment.yaml
configMapGenerator:
- name: gaas-config
files:
diff --git
a/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-deployment.yaml
b/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-deployment.yaml
index a949979..ff11411 100644
--- a/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-deployment.yaml
+++ b/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-deployment.yaml
@@ -30,7 +30,7 @@ spec:
persistentVolumeClaim:
claimName: mysql-pv-claim
containers:
- - image: mysql:5.6
+ - image: mysql:5.6.45
name: mysql
env:
- name: MYSQL_RANDOM_ROOT_PASSWORD
diff --git a/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-pv.yaml
b/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-pv.yaml
index 77d58d9..7f498d2 100644
--- a/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-pv.yaml
+++ b/gobblin-kubernetes/gobblin-service/mysql-cluster/mysql-pv.yaml
@@ -5,7 +5,6 @@ metadata:
labels:
type: local
spec:
- storageClassName: manual
capacity:
storage: 1Gi
accessModes:
@@ -18,7 +17,6 @@ kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
spec:
- storageClassName: manual
accessModes:
- ReadWriteOnce
resources: