dariabezkorovaina commented on code in PR #30351: URL: https://github.com/apache/beam/pull/30351#discussion_r1497801704
########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. Review Comment: <img width="873" alt="image" src="https://github.com/apache/beam/assets/99457894/30fb51d6-d149-49ce-b9ce-7acd577f7756"> Data ingestion is outside of Beam model according to this graph? Suggest changing to: "You can use Apache Beam for various tasks within your AI/ML pipeline, including data validation, preprocessing, model validation, and model deployment and inference." According to the documentation: https://beam.apache.org/documentation/ml/about-ml/ ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. + +Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference. + +Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/). Review Comment: ```suggestion Additionally, Apache Beam allows integration with pre-trained models from [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [TensorFlow](https://www.tensorflow.org/). ``` ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. + +Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference. + +Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/). + +Machine learning inference can be performed in Apache Beam using the `RunInference API`. +[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform supports usage of any custom models. See [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) for an example of running inference on a model loaded with [spaCy](https://spacy.io/). Review Comment: ```suggestion [`RunInference`](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) is a [`PTransform`](https://beam.apache.org/documentation/basics/#ptransform) that supports the usage of any custom models. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. Review Comment: ```suggestion 3. Compile and run the KFP pipeline by submitting it to a KFP client endpoint. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. + +Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline: +``` + kfp + ├── pipeline.py + ├── components + │ └── preprocess + │ ├── Dockerfile + │ ├── component.yaml + │ ├── requirements.txt + │ └── src + │ └── preprocess.py + └── requirements.txt +``` + +Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code. +Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry. + +As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`. + +```python + Review Comment: ```suggestion ``` ########## learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). Review Comment: ```suggestion However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). ``` ########## learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). Review Comment: I proofread the introduction ('About Beam`), but it does not look relevant in these orchestration prompts because the prompts do not ask what Apache Beam is. We also did not add this introduction about Beam to any other prompts. The prompt suggests people know what is Beam already. Possible suggestions: A) Move this introduction to the 33_ai_ml_pipelines and adjust its prompt to "What is Apache Beam and how can I use it for my AI/ML pipeline?". 2) Keep it in the orchestration prompts but change the prompts to "What is Apache Beam and ..." Otherwise, we'll need to include this part into all our prompts? ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. + +Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference. + +Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/). + +Machine learning inference can be performed in Apache Beam using the `RunInference API`. Review Comment: ```suggestion To execute machine learning inference tasks, Apache Beam provides the RunInference API. ``` ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? Review Comment: ```suggestion How can I use Apache Beam in my AI/ML pipeline? ``` ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. Review Comment: ```suggestion Apache Beam offers a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) that allow for reading and writing data from and to various data sources and sinks, as well as performing data validation. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? Review Comment: ```suggestion How can I orchestrate Apache Beam pipelines with KubeFlow? ``` ########## learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with TFX? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. Review Comment: ```suggestion You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference. ``` ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. + +Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference. + +Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/). + +Machine learning inference can be performed in Apache Beam using the `RunInference API`. +[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform supports usage of any custom models. See [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) for an example of running inference on a model loaded with [spaCy](https://spacy.io/). + +`RunInference` works well on arbitrary large models. See [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) for example of deployment and inference on LLM. + +For more information on building AI/ML pipelines see [here](https://beam.apache.org/documentation/ml/overview/). Review Comment: ```suggestion For more information on implementing AI/ML pipelines using Apache Beam, see the [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) section in the Apache Beam documentation. ``` ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. + +Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference. Review Comment: ```suggestion For data preprocessing, Apache Beam provides the [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) class. This feature allows you to ensure data consistency by applying the same preprocessing steps for both training and inference. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). Review Comment: ```suggestion However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). ``` ########## learning/prompts/documentation-lookup/33_ai_ml_pipelines.md: ########## @@ -0,0 +1,20 @@ +Prompt: +How can I use Apache Beam in AI/ML pipeline? +Response: +You can use Apache Beam for data ingestion, validation, preprocessing, and inference. + +Apache Beam has a rich set of [I/O connectors](https://beam.apache.org/documentation/io/connectors/) and [transforms](https://beam.apache.org/documentation/transforms/python/) which allow to read and write data from/to various data sources and perform data validation. + +Apache Beam also has a [MLTransform](https://beam.apache.org/documentation/ml/preprocess-data/) for preprocessing data. This allows you to ensure data consistency using same steps for training and inference. + +Apache Beam allows using pre-trained models with [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/stable/), and [Tensorflow](https://www.tensorflow.org/). + +Machine learning inference can be performed in Apache Beam using the `RunInference API`. +[RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) transform supports usage of any custom models. See [here](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) for an example of running inference on a model loaded with [spaCy](https://spacy.io/). + +`RunInference` works well on arbitrary large models. See [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) for example of deployment and inference on LLM. Review Comment: ```suggestion The `RunInference` transform efficiently handles models of any size, making it suitable for large language models (LLMs) and other complex architectures. You can find an example of deploying and performing inference on large language models (LLMs) in the [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) section of the Apache Beam documentation. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. + +Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline: Review Comment: ```suggestion Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing pipeline: ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). Review Comment: ```suggestion To execute a pipeline, you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). Review Comment: ```suggestion Apache Beam is an open-source, unified programming model for defining batch and streaming data-parallel processing pipelines. Each Apache Beam pipeline is represented as a Directed Acyclic Graph (DAG) of data processing operations constructed using the Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. Review Comment: ```suggestion You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: Review Comment: ```suggestion To run an Apache Beam pipeline as a part of a Kubeflow Pipeline (KFP), you need to: ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. Review Comment: ```suggestion You can integrate an Apache Beam pipeline into a Kubeflow pipeline, where the Apache Beam pipeline DAG serves as a node within the Kubeflow pipeline DAG. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. + +Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline: +``` + kfp + ├── pipeline.py + ├── components + │ └── preprocess + │ ├── Dockerfile + │ ├── component.yaml + │ ├── requirements.txt + │ └── src + │ └── preprocess.py + └── requirements.txt +``` + +Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code. Review Comment: ```suggestion Apache Beam pipelines can be used as KFP components consisting of a YAML specification `component.yaml` and a Python source file `preprocess.py`. The YAML file defines the input and output arguments of the component, while the Python file contains the Apache Beam pipeline code. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. Review Comment: ```suggestion Kubeflow is an open-source ML platform tailored for MLOps, which comprises best practices and guiding principles to streamline the development and maintenance of ML workflows. Kubeflow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of DAGs. The DAGs orchestrate the scheduling and execution of pipeline steps and facilitate the transfer of execution parameters, metadata, and artifacts between steps. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. Review Comment: ```suggestion 2. Create the KFP pipeline by connecting components and defining how inputs and outputs are exchanged between them. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. + +Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline: +``` + kfp + ├── pipeline.py + ├── components + │ └── preprocess + │ ├── Dockerfile + │ ├── component.yaml + │ ├── requirements.txt + │ └── src + │ └── preprocess.py + └── requirements.txt +``` + +Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code. +Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry. + +As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`. Review Comment: ```suggestion As a final step, the KFP pipeline is compiled into a JSON file and submitted to a KFP client endpoint using `pipeline.py`. ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). Review Comment: I proofread the introduction ('About Beam`), but it does not look relevant in these orchestration prompts because the prompts do not ask what Apache Beam is. We also did not add this introduction about Beam to any other prompts. The prompt suggests people know what is Beam already. Possible suggestions: A) Move this introduction to the 33_ai_ml_pipelines and adjust its prompt to "What is Apache Beam and how can I use it for my AI/ML pipeline?". 2) Keep it in the orchestration prompts but change the prompts to "What is Apache Beam and ..." Otherwise, we'll need to include this part into all our prompts? ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. + +Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline: +``` + kfp + ├── pipeline.py + ├── components + │ └── preprocess + │ ├── Dockerfile + │ ├── component.yaml + │ ├── requirements.txt + │ └── src + │ └── preprocess.py + └── requirements.txt +``` + +Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code. +Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry. + +As a final step, the KFP pipeline is compiled to a json file and submitted to a KFP client endpoint in `pipeline.py`. + +```python + +Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json") + +client = kfp.Client() +experiment = client.create_experiment("KFP orchestration example") +run_result = client.run_pipeline( + experiment_id=experiment.id, + job_name="KFP orchestration job", + pipeline_package_path="pipeline.json", + params=run_arguments) +``` +See [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/kfp) for a complete example of orchestrating Apache Beam preprocessing pipeline with KubeFlow. Review Comment: ```suggestion For a complete example of orchestrating an Apache Beam preprocessing pipeline with Kubeflow, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/kfp). ``` ########## learning/prompts/documentation-lookup/36_ai_kfp_orchestration.md: ########## @@ -0,0 +1,49 @@ +Prompt: +How can I orchestrate Apache Beam pipelines with KubeFlow? +Response: +Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines.Each Apache Beam pipeline is a Directed Acyclic Graph (DAG) of data processing operations constructed using Apache Beam SDK in one of the [supported languages](https://beam.apache.org/documentation/sdks/java/). +To run a pipeline you need to deploy it to one of the supported [Beam runners](https://beam.apache.org/documentation/runners/capability-matrix/). +You can use Apache Beam as an integral part of a machine learning project for data ingestion, validation, preprocessing, model validation, and inference. +A full ML workflow also contains other steps such as data exploration, feature engineering, model training, and model deployment. Furthermore, ML workflow needs to track metadata and artifacts for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). + + `KubeFlow` is an open source ML platform that is designed to enable MLOps - a set of best practices and guiding principles aimed to streamline development and maintenance of ML systems. KubeFlow provides a set of tools for building, deploying, and managing end-to-end ML pipelines in the form of a `DAG` responsible for scheduling and running the pipeline steps and passing execution parameters, metadata, and artifacts between the steps. + +You can make Apache Beam pipeline part of a Kubeflow pipeline. In this case Apache Beam pipeline `DAG` becomes a node in the Kubeflow pipeline `DAG`. + +In order to run an Apache Beam pipeline as part of a Kubeflow pipeline you need to: +1. Create KFP components by specifying interfaces and containerizing the component implementation. +2. Create KFP pipeline by connecting components, and defining how inputs and outputs are passed between the components. +3. Compile and run KFP pipeline by submitting it to a KFP client endpoint. + +Following is an example of a KFP pipeline that orchestrates an Apache Beam preprocessing pipeline: +``` + kfp + ├── pipeline.py + ├── components + │ └── preprocess + │ ├── Dockerfile + │ ├── component.yaml + │ ├── requirements.txt + │ └── src + │ └── preprocess.py + └── requirements.txt +``` + +Apache Beam pipelines can be used as `KFP` components consisting of a yaml specification `component.yaml` and a python source file `preprocess.py`. The yaml file specifies the input and output arguments of the component, and the python file contains the Apache Beam pipeline code. +Because `KFP` provides the input and output arguments as command-line arguments, an `argumentparser` is needed. Component logic and requirements are containerized and the container image is pushed to a container registry. Review Comment: ```suggestion Since KFP provides input and output arguments as command-line arguments, an `ArgumentParser` is required. The component logic and requirements are containerized, and the container image is pushed to a container registry. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
