Zhengliang Zhu created AIRFLOW-4894:
---------------------------------------

             Summary: Add hook and operator for GCP Data Loss Prevention API
                 Key: AIRFLOW-4894
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4894
             Project: Apache Airflow
          Issue Type: New Feature
          Components: api, gcp, hooks, operators, tests
    Affects Versions: 1.10.3
            Reporter: Zhengliang Zhu
            Assignee: Zhengliang Zhu


Add a hook and operator to manipulate and use Google Cloud Data Loss 
Prevention(DLP) API. DLP API allow users to inspect or redact sensitive data in 
text contents or GCP storage locations.

The hook includes the following APIs, implemented with Google service discovery 
API:
 * inspect/deidentify/reidentify for content: 
[https://cloud.google.com/dlp/docs/reference/rest/v2/projects.content]
 * create/delete/get/list/patch for inspectTemplates: 
[https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.inspectTemplates],
 [https://cloud.google.com/dlp/docs/reference/rest/v2/projects.inspectTemplates]
 * create/delete/get/list/patch for storedInfoTypes: 
[https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.storedInfoTypes],
 [https://cloud.google.com/dlp/docs/reference/rest/v2/projects.storedInfoTypes]
 * create/list/get/delete/cancel for dlpJobs: 
[https://cloud.google.com/dlp/docs/reference/rest/v2/projects.dlpJobs]

The operator creates a long-running dlp job (for storage inspection or risk 
analysis), keeps polling its status and waits for it to be done or 
canceled/deleted.

Apart from unit tests, also tested locally in DAG level(not included in PR).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to