alamb opened a new issue, #23258: URL: https://github.com/apache/datafusion/issues/23258
### Is your feature request related to a problem or challenge? As software development in general becomes more and more agent driven / search driven, it is important to make sure that the content on the DataFusion website is part of the content used to make those development decisions If the datafusion website is invisible to agents then we won't show up when people ask said agents to help them build tools, etc There are a few things that the https://datafusion.apache.org is clearly missing 1. `/robots.txt` with clear crawl rules (basically should crawl everything) -- for example from duckdb: https://duckdb.org/robots.txt 2. /sitemap.xml listing canonical URLs, keep it updated on publish - for example from duckdb: https://duckdb.org/sitemap.xml There is a bunch more stuff from https://isitagentready.com/datafusion.apache.org but I think robots.txt ### Describe the solution you'd like Add robots.txt and sitemap.xml Ideally using one PR for each feature The sitemap.xml should be auto generated as part of the sphinx build process ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
