[ https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695772#comment-16695772 ]
Fabian Hueske commented on FLINK-9541: -------------------------------------- Thanks [~kkrugler], I think preventing {{flink-docs-master}} from being crawled would be nice but is not be biggest issue. Let's focus on the other parts first. Right now, we are hosting docs for all versions >= 1.0. Not sure if we should delete older versions but IMO docs for versions < 1.4 should only be served if users explicitly ask for it. To be honest, I don't have much experience with SEO, so please correct me if I got something wrong. If I understood the issue right, we need to provide two files ({{robots.txt}} and {{sidemap.xml}} that we need to put into the root? folder of the documentation. I see the following options: 1. We could increase the priority of {{flink-docs-stable}} and keep the docs for all other branches at the default (possibly decrease the priority of {{flink-docs-master}}). If we do that, we don't need to touch the {{sitemap.xml}} for each release. If necessary, we can manually adjust the priority for older versions. 2. We could also fine-tune the priority of all branches and adjust the weights with each release. I guess the first approach is easier because we don't need to worry about integrating it with the release process. > Add robots.txt and sitemap.xml to Flink website > ----------------------------------------------- > > Key: FLINK-9541 > URL: https://issues.apache.org/jira/browse/FLINK-9541 > Project: Flink > Issue Type: Improvement > Components: Project Website > Reporter: Fabian Hueske > Priority: Major > > From the [dev mailing > list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]: > {quote} > It would help to add a sitemap (and the robots.txt required to reference it) > for flink.apache.org and ci.apache.org (for /projects/flink) > You can see what Tomcat did along these lines - > http://tomcat.apache.org/robots.txt references > http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing > to http://tomcat.apache.org/sitemap-main.xml > By doing this, you can emphasize more recent versions of docs. There are > other benefits, but reducing poor Google search results (to me) is the > biggest win. > E.g. https://www.google.com/search?q=flink+reducingstate > <https://www.google.com/search?q=flink+reducingstate> (search on flink > reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) > Javadocs (hit #2), and then many pages of other results. > Whereas the Javadocs for 1.5 > <https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/api/common/state/ReducingState.html> > and 1.4 > <https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/api/common/state/ReducingState.html> > are nowhere to be found. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)