[
https://issues.apache.org/jira/browse/AIRFLOW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895291#comment-16895291
]
Elad commented on AIRFLOW-3503:
-------------------------------
I don't think this example code can ever work.
The hook.delete() can delete a single file. You can specify /* and expect it
to delete everything in that path.
The proper way to achieve such functionality is something like:
{code:java}
def delete_folder(path_to_delete):
"""
Delete files Google cloud storage
"""
hook = GoogleCloudStorageHook(
google_cloud_storage_conn_id=CONNECTION_ID)
files = hook.list(
bucket=GCS_BUCKET_ID,
prefix=path_to_delete)
for file in files:
hook.delete(
bucket=GCS_BUCKET_ID,
object=file)
{code}
Maybe the best approach to resolve this is to do what happens in delete_objects
of
[S3Hook|https://github.com/apache/airflow/blob/master/airflow/hooks/S3_hook.py#L520].
The delete_objects know it's a single file if keys is a string and multiple
files if keys is a list.
With that approach you can just use the output of list() directly as input to
delete()
I think this simplify the process significantly.
> GoogleCloudStorageHook delete return success when nothing was done
> -------------------------------------------------------------------
>
> Key: AIRFLOW-3503
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3503
> Project: Apache Airflow
> Issue Type: Bug
> Components: gcp
> Affects Versions: 1.10.1
> Reporter: lot
> Assignee: Yohei Onishi
> Priority: Major
> Labels: gcp, gcs, hooks
>
> I'm loading files to BigQuery from Storage using:
>
> {{gcs_export_uri = BQ_TABLE_NAME + '/' + EXEC_TIMESTAMP_PATH + '/*'
> gcs_to_bigquery_op = GoogleCloudStorageToBigQueryOperator( dag=dag,
> task_id='load_products_to_BigQuery', bucket=GCS_BUCKET_ID,
> destination_project_dataset_table=table_name_template,
> source_format='NEWLINE_DELIMITED_JSON', source_objects=[gcs_export_uri],
> src_fmt_configs=\{'ignoreUnknownValues': True},
> create_disposition='CREATE_IF_NEEDED', write_disposition='WRITE_TRUNCATE',
> skip_leading_rows = 1, google_cloud_storage_conn_id=CONNECTION_ID,
> bigquery_conn_id=CONNECTION_ID)}}
>
> After that I want to delete the files so I do:
> {{def delete_folder():}}
> {{ """}}
> {{ Delete files Google cloud storage}}
> {{ """}}
> {{ hook = GoogleCloudStorageHook(}}
> {{ google_cloud_storage_conn_id=CONNECTION_ID)}}
> {{ hook.delete(}}
> {{ bucket=GCS_BUCKET_ID,}}
> {{ object=gcs_export_uri)}}
>
>
> {{This runs with PythonOperator.}}
> {{The task marked as Success even though nothing was deleted.}}
> {{Log:}}
> [2018-12-12 11:31:29,247] \{base_task_runner.py:98} INFO - Subtask:
> [2018-12-12 11:31:29,247] \{transport.py:151} INFO - Attempting refresh to
> obtain initial access_token [2018-12-12 11:31:29,249]
> \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,249]
> \{client.py:795} INFO - Refreshing access_token [2018-12-12 11:31:29,584]
> \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,583]
> \{python_operator.py:90} INFO - Done. Returned value was: None
>
>
> I expect the function to fail and return something like "file was not found"
> if there is nothing to delete Or let the user decide with specific flag if he
> wants the function to fail or success if files were not found.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)