ashb commented on issue #42136: URL: https://github.com/apache/airflow/issues/42136#issuecomment-3205934422
@opeida Ah interesting, thanks. Nice debugging! So at the very least in that circumstance we can write something to disk, but since it was a network error (one we should retry, but it was a network error still) that prevented the worker marking the task as started, and thus it can't communicate the hostname where it's running. Things we should do to fix that: - [ ] Capture that error in the TaskSDK supervisor, and write it to the file (We already know the log file path to write to, this is sent to us as part of the ExecuteTask workload message) - [ ] Perhaps add something in Celery Executor to notice case where there is an exception in the task result? (I think that's where the stack trace you showed would end up to the CeleryResult, right?) We could write a short message in the "Audit Log" table for htis - [ ] Update the UI to handle the case where hostname is not set (and there are no remote logs available) and say "No logs found, check the Audit Log for errors" This is a slight abuse of the audit log table mind you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
