thejens opened a new issue #17962: URL: https://github.com/apache/airflow/issues/17962
### Description https://github.com/apache/airflow/pull/17946 implements a `/robots.txt` endpoint to block search engines crawling Airflow - in the cases where it is (accidentally) exposed to the public Internet. If we record any GET requests to that end-point we'd have a strong warning flag that the deployment is exposed, and could issue a warning in the UI, or even enable some kill-switch on the deployment. Some deployments are likely intentionally available and rely on auth mechanisms on the `login` endpoint, so there should be a config option to suppress the warnings. An alternative approach would be to monitor for requests from specific user-agents used by crawlers for the same reasons ### Use case/motivation People who accidentally expose airflow have a slightly higher chance of realising they've done so and tighten their security. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
