dstandish commented on code in PR #29058:
URL: https://github.com/apache/airflow/pull/29058#discussion_r1087278596
##########
airflow/utils/db_cleanup.py:
##########
@@ -159,6 +170,14 @@ def _do_delete(*, query, orm_model, skip_archive, session):
logger.debug("delete statement:\n%s", delete.compile())
session.execute(delete)
session.commit()
+ if export_to_csv:
+ if not output_path.startswith(AIRFLOW_HOME):
+ output_path = os.path.join(AIRFLOW_HOME, output_path)
+ os.makedirs(output_path, exist_ok=True)
+ _to_csv(
+ target_table=target_table,
file_path=f"{output_path}/{target_table_name}.csv", session=session
+ )
+ skip_archive = True
Review Comment:
I just think it doesn't really hurt to leave the data backed up in a table.
A user might not understand that by wanting the data exported it will also be
deleted irretrievably. What if something goes wrong with the export? What if
the user is running this with `kubectl exec -ti` and the pod dies before they
save the data? I would let skip archive define the skip archive behavior
_always_ and not have the export behavior determine it. User can always drop
the tables later but they can't bring them back. We also can't really sure
that our CSV export would actually be easily importable back into the database
without loss.
BUT that's of course just one man's take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]