U-Ozdemir opened a new pull request #8968: URL: https://github.com/apache/airflow/pull/8968
This my first time working with an open source project and I posting here the first attempt for an ML operator guide. It is still in progress now, but some feedback is always welcome. # Guide for ML operator (Work in progress) ## AI Platform: - The [AI Platform](https://cloud.google.com/ai-platform/docs/technical-overview) is used to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data. - Machine learning (ML) is a subfield of artificial intelligence (AI). The goal of ML is to make computers learn from the data that you give them. Instead of writing code that describes the action the computer should take, your code provides an algorithm that adapts based on examples of intended behavior. The resulting program, consisting of the algorithm and associated learned parameters, is called a trained model. ## Prerequisite Tasks To use these operators, you must do a few things: - Select or create a Cloud Platform project using [Cloud Console](https://console.cloud.google.com/cloud-resource-manager). - Enable billing for your project, as described in [Google Cloud documentation](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project). - Enable API, as described in [Cloud Console documentation](https://cloud.google.com/apis/docs/getting-started). - Install API libraries via pip. > pip install 'apache-airflow[gcp]' Detailed information is available [Installation](https://airflow.readthedocs.io/en/latest/installation.html) - Setup [connection](https://airflow.readthedocs.io/en/latest/howto/connection/gcp.html). # Service description: ## AI Platform: - The AI Platform is used to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data. - Machine learning (ML) is a subfield of artificial intelligence (AI). The goal of ML is to make computers learn from the data that you give them. Instead of writing code that describes the action the computer should take, your code provides an algorithm that adapts based on examples of intended behavior. The resulting program, consisting of the algorithm and associated learned parameters, is called a trained model. ## Operators - You will need to set up a Python dictionary containing all the arguments applied to all the tasks in your workflow by using default_args. - start_date = determines the execution day of the first DAG task instant - params = a dictionary of DAG level parameters that are made accessible in templates, namespaced under params. These params can be overridden at the task level. ``` default_args = { "start_date": days_ago(1), "params": { "model_name": MODEL_NAME } } ``` ### MLEngineManageModelOperator - Use the **MLEngineManageModelOperator** to create a ML model. The task_id refers to name of task (creating a model in this case). It's basically a description of what your task does. Project_id refers to the name you have given your project, model_name refers to the name you have given your model. ``` create_model = MLEngineManageModelOperator( task_id="create-model", project_id=PROJECT_ID, operation='create', model={ "name": MODEL_NAME, }, ) ``` ### MLEngineCreateVersionOperator - With the **MLEngineCreateVersionOperator** a version can be created of a operator. The task_id is a unique, meaningful id for the task, project_id is a the name of the project it refers too, model_name refers to the name of your model, version contains information you have giving to this version of operator. (Do I need to explain the arguments in version? I don't see this in other examples.) ``` create_version = MLEngineCreateVersionOperator( task_id="create-version", project_id=PROJECT_ID, model_name=MODEL_NAME, version={ "name": "v1", "description": "First-version", "deployment_uri": '{}/keras_export/'.format(JOB_DIR), "runtime_version": "1.14", "machineType": "mls1-c1-m2", "framework": "TENSORFLOW", "pythonVersion": "3.5" } ) ``` ### MLEngineDeleteVersionOperator - Use the **MLEngineDeleteVersionOperator** to delete a version of your ML model. The task_id refers to what your task does (it's just name), project_id refers to the name you have given your project, model_name refers to the name you have given your model, version_name refers to which version of a model you want to delete. ``` delete_version = MLEngineDeleteVersionOperator( task_id="delete-version", project_id=PROJECT_ID, model_name=MODEL_NAME, version_name="v1" ) ``` ### MLEngineDeleteModelOperator - The **MLEngineDeleteModelOperator** deletes your whole ML model. The task_id is the a descriptive name given to a task, project_id refers to the name you have given your project, model_name refers to the name you have given your model, delete_contents when set on True is will delete eveything within your ML model (model_name). ``` delete_model = MLEngineDeleteModelOperator( task_id="delete-model", project_id=PROJECT_ID, model_name=MODEL_NAME, delete_contents=True ) ``` --- Make sure to mark the boxes below before creating PR: [x] - [ ] Description above provides context of the change - [ ] Unit tests coverage for changes (not needed for documentation changes) - [ ] Target Github ISSUE in description if exists - [ ] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [ ] Relevant documentation is updated including usage instructions. - [ ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org