abhishekshenoy commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-760583797


   Hi Team ,
    
        
   We are facing similar issue on. GCP.
   
   Apache Airflow version: 2.0
   
   Kubernetes version (kubectl version):v1.18.2
   
   Environment:
   Cloud provider or hardware configuration: GCP
   OS (e.g. from /etc/os-release): Container Optimized OS.
   
   We use GCS to store our logs. Recently we are seeing an issue wherein while 
executing DataprocOperators (CreateCluster , SubmitJob, DeleteCluster) , though 
the operator has successfully executed. Airflow is unable to read the logs from 
the pod and write it to GCS. Which intern results in task failure but in the 
background the actual DataProc operator has successfully completed.
   
   This becomes a major issue wherein it runs the SparkSubmitJob twice 
resulting in duplicate data as well as in the below case it respawned another 
task to stop the cluster but the new task failed because the cluster was 
already stopped by the previous task which was marked as failed because it was 
unable to read logs.
   
   Exception Stack Trace of such failed task is as below
   ```
   *** Unable to read remote log from 
gs://hnw-airflow-prod-ba7642cd7876f2/logs/extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log
   *** 404 GET 
https://storage.googleapis.com/download/storage/v1/b/hnw-airflow-prod-ba7642cd7876f2/o/logs%2Fextraction_workflow%2Fstop_cluster%2F2021-01-12T10%3A00%3A00%2B00%3A00%2F1.log?alt=media:
 No such object: 
hnw-airflow-prod-ba7642cd7876f2/logs/df_raw_file_extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log:
 ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 
200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   *** Trying to get logs (last 100 lines) from worker pod 
dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   *** Unable to fetch logs from worker pod 
dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'b677a80e-a9f8-4794-9085-ffb74aa2f443', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'Date': 'Wed, 13 Jan 2021 17:58:33 GMT', 
'Content-Length': '294'})
   HTTP response body: 
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
 \\"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464\\" not 
found","reason":"NotFound","details":{"name":"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464","kind":"pods"},"code":404}\n'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to