We can just disallow /d and then allow all the  *-current folders
under it explicitly. The only difference I see is that we'd have
a couple of more entries in the robots.txt.

-- Richard

> On 07.04.2016, at 22:36, Marshall Schor <[email protected]> wrote:
> 
> Hi,
> 
> This sounds like a good idea to me :-)
> 
> There's one small issue possibly, to changing the folder structure.  The 
> DOCBOOK
> schemes have some fancy way to link between docbooks; these require that the
> books be kept relative to one another in some file tree structure.  As long as
> that's not changed, I think there will be no problem. 
> 
> If anyone's curious, the relevant bits of config info are in the
> uima-docbook-olink project, in the various "site.xml" files.  You can see refs
> to the famous "d" folder there.  There may be a dependency on the "books" 
> being
> just one directory layer under d/, so putting an extra layer might break 
> things
> (but I'm not sure...).
> 
> Maybe there's a way to do this without introducing a new level in the 
> directory?
> 
> -Marshall
> 
> On 4/6/2016 4:43 PM, Richard Eckart de Castilho wrote:
>> Hi all,
>> 
>> I believe some time back we were talking about a strategy to avoid search 
>> engines pointing to ancient version of the UIMA documentation.
>> 
>> I have read a bit on rel="canonical" and robots.txt.
>> 
>> 1) per webpage - Apparently, one can place a `link rel="canonical"` element 
>> on any HTML page. Search engines seeing this tag will then not index this 
>> page because it is considered to be a duplicate of whatever other page the 
>> link points to.
>> 
>> 2) via http header/htaccess - Since we probably don't want to patch up all 
>> our JavaDoc files, the information about a canonical source can also be sent 
>> in the HTTP header, e.g. via a suitable htaccess file.
>> 
>> I guess the idea would be that for any old documentation page, we would want 
>> it to point to its latest version as its canonical source. I mean for every 
>> page, not only for the index page. This seems a bit tedious.
>> 
>> My suggestion would be an alternative that exploits the website folder 
>> structure and uses robots.txt.
>> 
>> We disallow indexing of the "d" folder on the UIMA website.
>> We place all the "*-current" folders (svn copies of the latest documentation 
>> versions) under a dedicated folder (e.g. "d/current") and allow indexing 
>> that.
>> 
>> In that way, the outdated versions of the documentation should be hidden 
>> from the search engines and the respective latest versions should be indexed.
>> 
>> Opinions? Does anybody have experience with SEO?
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>> 
> 

Reply via email to