eladkal commented on a change in pull request #7410: [AIRFLOW-6790] Add basic Tableau Integration URL: https://github.com/apache/airflow/pull/7410#discussion_r381519644
########## File path: airflow/providers/salesforce/example_dags/example_tableau_refresh_workbook.py ########## @@ -0,0 +1,59 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +""" +This is an example dag that performs a refresh operation on a Tableau Workbook aka Extract. Since this is an +asynchronous operation we don't know when the operation actually finishes. That's why we have a second task +that checks exactly that. So that you can perform further operations after the extract has been refreshed. +""" +from datetime import timedelta + +from airflow import DAG +from airflow.providers.salesforce.operators.tableau_refresh_workbook import TableauRefreshWorkbookOperator +from airflow.providers.salesforce.sensors.tableau_job_status import TableauJobStatusSensor +from airflow.utils.dates import days_ago + +DEFAULT_ARGS = { + 'owner': 'airflow', + 'depends_on_past': False, + 'start_date': days_ago(2), + 'email': ['[email protected]'], + 'email_on_failure': False, + 'email_on_retry': False +} + +with DAG( + dag_id='example_tableau_refresh_workbook', + default_args=DEFAULT_ARGS, + dagrun_timeout=timedelta(hours=2), + schedule_interval=None, + tags=['example'], +) as dag: + task_refresh_workbook = TableauRefreshWorkbookOperator( + site_id='my_site', + workbook_name='MyWorkbook', + task_id='refresh_tableau_workbook', + dag=dag + ) + task_check_job_status = TableauJobStatusSensor( Review comment: I think having a separate sensor or making it configurable is crucial. When you perform Extract the processing of the job is "moving" from Airflow worker to Tableau server. There is no reason for the Airflow operator to continue occupying the worker needlessly. BashOperator is different. The script runs on the Airflow worker resources. It can also be used to connect with another service resulted in waiting time but this is an option unlike the TableauRefreshWorkbookOperator where it always executed on the Tableau server resources. Think also about a case where many Dags uses TableauRefreshWorkbookOperator at the same time. If all occupy the workers till extract is finished it might cause the entire Airflow cluster to be paralyzed. I think @feluelle concern about 2nd refresh is a valid one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
