george-zubrienko commented on issue #21387:
URL: https://github.com/apache/airflow/issues/21387#issuecomment-1040270851


   > Note that in the future we are likely to integrate open-telemetry for 
logging (there is a work in-progress on that) and that will allow to stream 
logs to any external or custom open-telemetry-compatible log sink in real time. 
This is the ultimate goal.
   
   I think this is definitely a way to go, and no PR I could offer will be 
better. For now, I've resolved our issue by adjusting the setup a bit:
   - setup logging to a persistent volume, so webserver can discover "local" 
logs while a task is running
   - setup remote logging, so once a task is done, log is shipped to remote 
storage
   - set a job to clean up PV regularly, since local logs are of no use. A note 
on this one, we actually disabled sidecar log groomer, because a) sidecar 
container, b) with >1 scheduler replica, we have >1 log groomer running `find 
...` on the whole PV, which is really unnecessary, plus they are racing against 
each other.
   
   This way we have realtime logs served from the PV (fileshare) and logs from 
done tasks are read from remote storage, which is also a cheaper setup since 
read transaction cost is lower on blob file storage.
   
   Let me know if I should close this issue and link it to the one where 
`open-telemetry` implementation is tracked!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to