namratachaudhary commented on code in PR #68213:
URL: https://github.com/apache/airflow/pull/68213#discussion_r3381350647
##########
task-sdk/src/airflow/sdk/bases/resumablejobmixin.py:
##########
@@ -101,41 +107,70 @@ def execute_resumable(self, context: Context) -> Any:
Closing this window would require atomic "submit + persist", which is
not possible across
an external system boundary.
"""
- task_store = context.get("task_store")
-
- if task_store is None:
- self.log.warning("task_store not available in context, crash
recovery is disabled for this run")
- else:
- external_id = task_store.get(self.external_id_key)
- if external_id:
- status = self.get_job_status(external_id, context)
- if self.is_job_active(status):
- self.log.info(
- "Reconnecting to existing job",
- external_id_key=self.external_id_key,
- external_id=external_id,
- status=status,
- )
- return self.poll_until_complete(external_id, context)
- if self.is_job_succeeded(status):
- self.log.info(
- "Job already completed successfully, skipping
resubmission",
- external_id_key=self.external_id_key,
- external_id=external_id,
- )
- return self.get_job_result(external_id, context)
+ operator_tag = {"operator": type(self).__name__}
+ reconnect_to: Any = None
+ already_succeeded_id: Any = None
Review Comment:
When the mixin finds a stored job that already succeeded, it skips
resubmission. Right now this case only sets a trace span attribute
(already_succeeded) but there is no counter for it. Since the PR description
lists "skipped a duplicate" as one of the things you want to observe, and since
dashboards are usually built on counters (not spans), I'd suggest adding a
counter for this case too, so it's consistent with the reconnect and
fresh-submit counters.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]