[GitHub] [incubator-liminal] aviemzur commented on a change in pull request #55: [LIMINAL-56] - Add default executor (k8s) for spark tasks

GitBox Mon, 12 Jul 2021 12:51:16 -0700


aviemzur commented on a change in pull request #55:
URL: https://github.com/apache/incubator-liminal/pull/55#discussion_r667906649




##########
File path: tests/runners/apps/test_spark_app/liminal.yml
##########
@@ -0,0 +1,39 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+name: MyPipeline
+images:
+  - image: my_spark_image
+    type: spark
+    source: wordcount
+pipelines:
+  - pipeline: my_pipeline
+    owner: Bosco Albert Baracus
+    start_date: 1970-01-01
+    timeout_minutes: 45
+    schedule: 0 * 1 * *
+    tasks:
+      - task: my_test_spark_task
+        type: spark
+        description: spark task on k8s
+        image: my_spark_image
+        executors: 2
+        source: wordcount
+        application_source: wordcount.py

Review comment:
       Would be useful to have an example with a superliminal that allows user 
to run their task using a different executor like EMR, which means the 
`application_source` needs to be augmented by superliminal to point to where 
code is hosted on S3 for example

##########
File path: liminal/runners/airflow/model/task.py
##########
@@ -28,15 +28,14 @@ class Task(ABC):
     """
 
     def __init__(self, task_id, dag, parent, trigger_rule, liminal_config, 
pipeline_config,
-                 task_config, executor=None):
+                 task_config):
         self.liminal_config = liminal_config
         self.dag = dag
         self.pipeline_config = pipeline_config
         self.task_id = task_id
         self.parent = parent
         self.trigger_rule = trigger_rule
         self.task_config = task_config
-        self.executor = executor
 
     @abstractmethod
     def apply_task_to_dag(self, **kwargs):

Review comment:
       We should aim to have `apply_task_to_dag` only be in executor
   We can have tasks that have this implementation implement a super called 
`AirflowTask` and change `DefaultExecutor` to be `AirflowExecutor` and use the 
executor to call a method of `AirflowTask` which augments the DAG which tasks 
like `create_cloudformation_stack` can then implement

##########
File path: liminal/core/config/defaults/base/liminal.yml
##########
@@ -21,12 +21,24 @@ type: super
 executors:
   - executor: default_k8s
     type: kubernetes
+  - executor: default_executor
+    type: default
 service_defaults:
   description: add defaults parameters for all services
 task_defaults:
   description: add defaults parameters for all tasks separate by task type
   python:
     executor: default_k8s
+  spark:
+    executor: default_k8s
+  job_end:

Review comment:
       Would be better to change the code so that we don't need to add every 
task type here with default_executor but that default one will be used if no 
other was found.

##########
File path: liminal/build/image/spark/Dockerfile
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+FROM bitnami/spark:3.1.2-debian-10-r23
+
+USER root
+
+# Set the working directory to /app
+WORKDIR /app
+
+# Install any needed packages specified in requirements.txt
+COPY ./requirements.txt /app/

Review comment:
       What if user isn't using pyspark and does not have a requirements file?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-liminal] aviemzur commented on a change in pull request #55: [LIMINAL-56] - Add default executor (k8s) for spark tasks

Reply via email to