abhishekshenoy edited a comment on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-760583797
Hi Team ,
We are facing similar issue on. GCP.
Apache Airflow version: 2.0
Kubernetes version (kubectl version):v1.18.2
Environment:
Cloud provider or hardware configuration: GCP
OS (e.g. from /etc/os-release): Container Optimized OS.
We use GCS to store our logs. Recently we are seeing an issue wherein while
executing DataprocOperators (CreateCluster , SubmitJob, DeleteCluster) , though
the operator has successfully executed. Airflow is unable to read the logs from
the pod and write it to GCS. Which intern results in task failure but in the
background the actual DataProc operator has successfully completed.(We are
seeing that not all runs are having this issue but intermittently in some runs
we see this.)
This becomes a major issue wherein it runs the SparkSubmitJob twice
resulting in duplicate data as well as in the below case it respawned another
task to stop the cluster but the new task failed because the cluster was
already stopped by the previous task which was marked as failed because it was
unable to read logs.
Exception Stack Trace of such failed task is as below
```
*** Unable to read remote log from
gs://hnw-airflow-prod-ba7642cd7876f2/logs/extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log
*** 404 GET
https://storage.googleapis.com/download/storage/v1/b/hnw-airflow-prod-ba7642cd7876f2/o/logs%2Fextraction_workflow%2Fstop_cluster%2F2021-01-12T10%3A00%3A00%2B00%3A00%2F1.log?alt=media:
No such object:
hnw-airflow-prod-ba7642cd7876f2/logs/df_raw_file_extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log:
('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK:
200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
*** Trying to get logs (last 100 lines) from worker pod
dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
*** Unable to fetch logs from worker pod
dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
(404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'b677a80e-a9f8-4794-9085-ffb74aa2f443', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'Date': 'Wed, 13 Jan 2021 17:58:33 GMT',
'Content-Length': '294'})
HTTP response body:
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\\"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464\\" not
found","reason":"NotFound","details":{"name":"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464","kind":"pods"},"code":404}\n'
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]