Hi all,

I believe some time back we were talking about a strategy to avoid search 
engines pointing to ancient version of the UIMA documentation.

I have read a bit on rel="canonical" and robots.txt.

1) per webpage - Apparently, one can place a `link rel="canonical"` element on 
any HTML page. Search engines seeing this tag will then not index this page 
because it is considered to be a duplicate of whatever other page the link 
points to.

2) via http header/htaccess - Since we probably don't want to patch up all our 
JavaDoc files, the information about a canonical source can also be sent in the 
HTTP header, e.g. via a suitable htaccess file.

I guess the idea would be that for any old documentation page, we would want it 
to point to its latest version as its canonical source. I mean for every page, 
not only for the index page. This seems a bit tedious.

My suggestion would be an alternative that exploits the website folder 
structure and uses robots.txt.

We disallow indexing of the "d" folder on the UIMA website.
We place all the "*-current" folders (svn copies of the latest documentation 
versions) under a dedicated folder (e.g. "d/current") and allow indexing that.

In that way, the outdated versions of the documentation should be hidden from 
the search engines and the respective latest versions should be indexed.

Opinions? Does anybody have experience with SEO?

Cheers,

-- Richard

Reply via email to