chrisrfu opened a new issue #15954:
URL: https://github.com/apache/airflow/issues/15954


   **Apache Airflow version**: v2.0.2
   
   
   **Environment**: 
   
   - **Cloud provider or hardware configuration**: on local machine using 
docker-compose
   - **OS** (e.g. from /etc/os-release): macOS Catalina 10.15.7
   - **Kernel** (e.g. `uname -a`): Darwin CFU.local 19.6.0 Darwin Kernel 
Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; 
root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
   
   **What happened**:
   
   configured remote logging to S3 bucket. Ran a DAG which consists of 3 tasks. 
Sometimes only log of last task appears in the bucket. Sometimes no log appears 
at all in the bucket. 
   
   **What you expected to happen**:
   
   everytime I run the DAG, i expect log files to appear in the S3 bucket, each 
task with its own folder for logs
   
   **How to reproduce it**:
   
   
   1. Using a combination of [**_Writing Logs to Amazon S3_** from v1.10.10 
docs](https://airflow.apache.org/docs/apache-airflow/1.10.10/howto/write-logs.html)
 and [**_[logging] docs_** from 
v2.0.2](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#config-logging-base-log-folder),
 i have this in `airflow.cfg`
   
   > [logging]
   base_log_folder = /opt/airflow/logs
   remote_logging = True
   remote_log_conn_id = s3_logs
   remote_base_log_folder = s3://my-bucket/logs
   
   2. `docker build -t name:tag -f Dockerfile .`
   3. `docker-compose --env-file ./config/.env.local up airflow-init`
   4. `docker-compose --env-file ./config/.env.local up`
   5. In the airflow UI, go to _**Admin**_ --> _**Connections**_
   > Conn Id = s3_logs
   Conn Type = S3
   Extra = 
{"aws_access_key_id":"<key_here>","aws_secret_access_key":"<secret_here>"}
   6. Run DAG
   
   Log of first task:
   
   > *** Falling back to local log
   *** Log file does not exist: 
/opt/airflow/logs/update_org_2_accounts_deploy/update_accounts_activity/2021-05-19T20:00:39.231872+00:00/1.log
   *** Fetching from: 
http://4371804b14d3:8793/log/update_org_2_accounts_deploy/update_accounts_activity/2021-05-19T20:00:39.231872+00:00/1.log
   *** Failed to fetch log file from worker. 
HTTPConnectionPool(host='4371804b14d3', port=8793): Max retries exceeded with 
url: 
/log/update_org_2_accounts_deploy/update_accounts_activity/2021-05-19T20:00:39.231872+00:00/1.log
 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 
0x7ff65926a160>: Failed to establish a new connection: [Errno -2] Name or 
service not known',))
   
   Log of second task:
   
   > *** Falling back to local log
   *** Log file does not exist: 
/opt/airflow/logs/update_org_2_accounts_deploy/update_vintage/2021-05-19T20:00:39.231872+00:00/1.log
   *** Fetching from: 
http://4371804b14d3:8793/log/update_org_2_accounts_deploy/update_vintage/2021-05-19T20:00:39.231872+00:00/1.log
   *** Failed to fetch log file from worker. 
HTTPConnectionPool(host='4371804b14d3', port=8793): Max retries exceeded with 
url: 
/log/update_org_2_accounts_deploy/update_vintage/2021-05-19T20:00:39.231872+00:00/1.log
 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 
0x7ff65925e438>: Failed to establish a new connection: [Errno -2] Name or 
service not known',))
   
   Log of third task:
   
   > *** Reading remote log from 
s3://convene-dw-industry/logs/update_org_2_accounts_deploy/update_industry/2021-05-19T20:00:39.231872+00:00/1.log.
   [2021-05-19 20:00:54,158] {taskinstance.py:877} INFO - Dependencies all met 
for <TaskInstance: update_org_2_accounts_deploy.update_industry 
2021-05-19T20:00:39.231872+00:00 [queued]>
   [2021-05-19 20:00:54,711] {taskinstance.py:877} INFO - Dependencies all met 
for <TaskInstance: update_org_2_accounts_deploy.update_industry 
2021-05-19T20:00:39.231872+00:00 [queued]>
   [2021-05-19 20:00:54,712] {taskinstance.py:1068} INFO - 
   
   the rest of the logs are good and it exists in s3
   
   **Anything else we need to know**:
   
   In the Airflow UI I tried:
   > Conn Id = s3_logs
   Conn Type = S3
   Login = <key_here>
   Password = <secret_here>
   
   and the same error would occur
   
   I do have a redshift database set up for airflow metadata. So when I 
completed Step 5 prior, I go right to Step 6.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to