[GitHub] [airflow] U-Ozdemir opened a new pull request #8968: Create guide for Machine Learning Engine operators #8207

GitBox Fri, 22 May 2020 05:14:46 -0700


U-Ozdemir opened a new pull request #8968:
URL: https://github.com/apache/airflow/pull/8968



   This my first time working with an open source project and I posting here 
the first attempt for an ML operator guide. It is still in progress now, but 
some feedback is always welcome.
   
   # Guide for ML operator (Work in progress)
   
   ## AI Platform:
   - The [AI 
Platform](https://cloud.google.com/ai-platform/docs/technical-overview) is used 
to train your machine learning models at scale, to host your trained model in 
the cloud, and to use your model to make predictions about new data.
   
   - Machine learning (ML) is a subfield of artificial intelligence (AI). The 
goal of ML is to make computers learn from the data that you give them. Instead 
of writing code that describes the action the computer should take, your code 
provides an algorithm that adapts based on examples of intended behavior. The 
resulting program, consisting of the algorithm and associated learned 
parameters, is called a trained model.
   
   
   ## Prerequisite Tasks
   
   To use these operators, you must do a few things:
   
   - Select or create a Cloud Platform project using [Cloud 
Console](https://console.cloud.google.com/cloud-resource-manager).
   - Enable billing for your project, as described in [Google Cloud 
documentation](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project).
   - Enable API, as described in [Cloud Console 
documentation](https://cloud.google.com/apis/docs/getting-started).
   - Install API libraries via pip.
   
   > pip install 'apache-airflow[gcp]'
   
   Detailed information is available 
[Installation](https://airflow.readthedocs.io/en/latest/installation.html)
   - Setup 
[connection](https://airflow.readthedocs.io/en/latest/howto/connection/gcp.html).
   
   # Service description:
   
   ## AI Platform:
   - The AI Platform is used to train your machine learning models at scale, to 
host your trained model in the cloud, and to use your model to make predictions 
about new data.
   
   - Machine learning (ML) is a subfield of artificial intelligence (AI). The 
goal of ML is to make computers learn from the data that you give them. Instead 
of writing code that describes the action the computer should take, your code 
provides an algorithm that adapts based on examples of intended behavior. The 
resulting program, consisting of the algorithm and associated learned 
parameters, is called a trained model.
   
   
   ## Operators
   
   - You will need to set up a Python dictionary containing all the arguments 
applied to all the tasks in your workflow by using default_args.
   - start_date = determines the execution day of the first DAG task instant
   - params = a dictionary of DAG level parameters that are made accessible in 
templates, namespaced under params. These params can be overridden at the task 
level.
   
   ```
   default_args = {
       "start_date": days_ago(1),
       "params": {
           "model_name": MODEL_NAME
       }
   }
   ```
   
   ### MLEngineManageModelOperator
   - Use the **MLEngineManageModelOperator** to create a ML model. The task_id 
refers to name of task (creating a model in this case). It's basically a 
description of what your task does. Project_id refers to the name you have 
given your project, model_name refers to the name you have given your model.
   
   
   
   ```
    create_model = MLEngineManageModelOperator(
           task_id="create-model",
           project_id=PROJECT_ID,
           operation='create',
           model={
               "name": MODEL_NAME,
           },
       )
   ```
   
   ### MLEngineCreateVersionOperator
   
   - With the **MLEngineCreateVersionOperator** a version can be created of a 
operator. The task_id is a unique, meaningful id for the task, project_id is  a 
the name of the project it refers too, model_name refers to the name of your 
model, version contains information you have giving to this version of 
operator. (Do I need to explain the arguments in version? I don't see this in 
other examples.)
   
   ```
   create_version = MLEngineCreateVersionOperator(
           task_id="create-version",
           project_id=PROJECT_ID,
           model_name=MODEL_NAME,
           version={
               "name": "v1",
               "description": "First-version",
               "deployment_uri": '{}/keras_export/'.format(JOB_DIR),
               "runtime_version": "1.14",
               "machineType": "mls1-c1-m2",
               "framework": "TENSORFLOW",
               "pythonVersion": "3.5"
           }
       )
   ```
   
   
   
   ### MLEngineDeleteVersionOperator
   - Use the **MLEngineDeleteVersionOperator** to delete a version of your ML 
model. The task_id refers to what your task does (it's just name), project_id 
refers to the name you have given your project, model_name refers to the name 
you have given your model, version_name refers to which version of a model you 
want to delete.
   
   ```
   delete_version = MLEngineDeleteVersionOperator(
           task_id="delete-version",
           project_id=PROJECT_ID,
           model_name=MODEL_NAME,
           version_name="v1"
       )
   ```
   ### MLEngineDeleteModelOperator
   - The **MLEngineDeleteModelOperator** deletes your whole ML model. The 
task_id is the a descriptive name given to a task, project_id refers to the 
name you have given your project, model_name refers to the name you have given 
your model, delete_contents when set on True is will delete eveything within 
your ML model (model_name).
   
   ```
   delete_model = MLEngineDeleteModelOperator(
           task_id="delete-model",
           project_id=PROJECT_ID,
           model_name=MODEL_NAME,
           delete_contents=True
       )
   ```
   
   
   
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [ ] Description above provides context of the change
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Target Github ISSUE in description if exists
   - [ ] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [airflow] U-Ozdemir opened a new pull request #8968: Create guide for Machine Learning Engine operators #8207

Reply via email to