thejens opened a new issue #17962:
URL: https://github.com/apache/airflow/issues/17962


   ### Description
   
   https://github.com/apache/airflow/pull/17946 implements a `/robots.txt` 
endpoint to block search engines crawling Airflow - in the cases where it is 
(accidentally) exposed to the public Internet.
   
   If we record any GET requests to that end-point we'd have a strong warning 
flag that the deployment is exposed, and could issue a warning in the UI, or 
even enable some kill-switch on the deployment. 
   
   Some deployments are likely intentionally available and rely on auth 
mechanisms on the `login` endpoint, so there should be a config option to 
suppress the warnings.
   
   An alternative approach would be to monitor for requests from specific 
user-agents used by crawlers for the same reasons
   
   ### Use case/motivation
   
   People who accidentally expose airflow have a slightly higher chance of 
realising they've done so and tighten their security.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to