[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695772#comment-16695772
 ] 

Fabian Hueske commented on FLINK-9541:
--------------------------------------

Thanks [~kkrugler],

I think preventing {{flink-docs-master}} from being crawled would be nice but 
is not be biggest issue. Let's focus on the other parts first.
Right now, we are hosting docs for all versions >= 1.0. Not sure if we should 
delete older versions but IMO docs for versions < 1.4 should only be served if 
users explicitly ask for it.

To be honest, I don't have much experience with SEO, so please correct me if I 
got something wrong. If I understood the issue right, we need to provide two 
files ({{robots.txt}} and {{sidemap.xml}} that we need to put into the root? 
folder of the documentation. 

I see the following options:
1. We could increase the priority of {{flink-docs-stable}} and keep the docs 
for all other branches at the default (possibly decrease the priority of 
{{flink-docs-master}}). If we do that, we don't need to touch the 
{{sitemap.xml}} for each release. If necessary, we can manually adjust the 
priority for older versions.
2. We could also fine-tune the priority of all branches and adjust the weights 
with each release.

I guess the first approach is easier because we don't need to worry about 
integrating it with the release process.



> Add robots.txt and sitemap.xml to Flink website
> -----------------------------------------------
>
>                 Key: FLINK-9541
>                 URL: https://issues.apache.org/jira/browse/FLINK-9541
>             Project: Flink
>          Issue Type: Improvement
>          Components: Project Website
>            Reporter: Fabian Hueske
>            Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
> <https://www.google.com/search?q=flink+reducingstate> (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/api/common/state/ReducingState.html>
>  and 1.4 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/api/common/state/ReducingState.html>
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to