nathadfield opened a new pull request #17321:
URL: https://github.com/apache/airflow/pull/17321


   Closes: #17032 
   
   *Recreated from #17252 
   
   `BigQueryInsertJobOperator` requires the specification of a configuration 
dict but, at present, this results in poor SQL rendering; especially if the 
queries are big or complex.
   
   ### Current Example
   
   <img width="1670" alt="127119136-3d6f1f57-c941-4b06-b3f3-c6094b31f821" 
src="https://user-images.githubusercontent.com/967119/127535467-adf52f70-362b-4a97-b7ae-8755d2da2988.png";>
   
   This PR attempts to implement the capability for specific elements of a 
dictionary provided to an operator to be rendered individually with the 
appropriate renderer.
   
   For example, when the following configuration is submitted to 
`BigQueryInsertJobOperator` then, in addition to the JSON rendering of 
`configuration` the provided SQL should be separately rendered.
   
   ```
   sql = '''
   INSERT INTO `target_project.target_dataset.target_table`
   (col1, col2, col3, col4)
   SELECT col1, col2, col3, col4
   FROM `source_project.source_dataset.source_table`
   '''
   
   test = BigQueryInsertJobOperator(
       task_id='test',
       gcp_conn_id='my_gcp_connection',
       configuration={"query": {"query": sql,
                                "destinationTable": {"projectId": 'my_project',
                                                     "datasetId": 'my_dataset',
                                                     "tableId": 'my_table'},
                                "writeDisposition": 'WRITE_TRUNCATE',
                                "useLegacySql": False}
                      }
   )
   ```
   
   To achieve it, this PR enables any operator to receive a dot-separated path 
from `template_field_renderers` along with the renderer type.  In this case we 
want `BigQueryInsertJobOperator` to extract the value from `configuration` dict 
where the key path is `query.query`.  This is fully represented as follows:
   
    `template_fields_renderers = {"configuration": "json", 
"configuration.query.query": "sql"}`
   
   When building the rendered page, we check if the content is a dict - e.g. 
`configuration` - then, if so, return a generator object that yields every 
possible path of keys (`get_key_paths`) in the form `key1.key2.key3...keyX`. 
   
   Using this we can then check to see if any of these paths match with what we 
have in `template_field_renderers` and, if so, what the renderer is.  If it is 
a valid renderer then we can use that path to retrieve the value from the 
configuration (`get_value_from_path`).
   
   ```
   if isinstance(content, dict):
       for dict_keys in get_key_paths(content):
           template_path = '.'.join((template_field, dict_keys))
           renderer = task.template_fields_renderers.get(template_path, 
template_path)
           if renderer in renderers:
               content_value = get_value_from_path(dict_keys, content)
               html_dict[template_path] = renderers[renderer](content_value)
   ```
   
   The result of this is a rendered page which looks like the following:
   
   <img width="1670" alt="Screenshot 2021-07-26 at 14 51 48" 
src="https://user-images.githubusercontent.com/967119/127123123-93beb8dd-eac3-44db-adb8-1e5a1115259a.png";>
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to