We can just disallow /d and then allow all the *-current folders under it explicitly. The only difference I see is that we'd have a couple of more entries in the robots.txt.
-- Richard > On 07.04.2016, at 22:36, Marshall Schor <[email protected]> wrote: > > Hi, > > This sounds like a good idea to me :-) > > There's one small issue possibly, to changing the folder structure. The > DOCBOOK > schemes have some fancy way to link between docbooks; these require that the > books be kept relative to one another in some file tree structure. As long as > that's not changed, I think there will be no problem. > > If anyone's curious, the relevant bits of config info are in the > uima-docbook-olink project, in the various "site.xml" files. You can see refs > to the famous "d" folder there. There may be a dependency on the "books" > being > just one directory layer under d/, so putting an extra layer might break > things > (but I'm not sure...). > > Maybe there's a way to do this without introducing a new level in the > directory? > > -Marshall > > On 4/6/2016 4:43 PM, Richard Eckart de Castilho wrote: >> Hi all, >> >> I believe some time back we were talking about a strategy to avoid search >> engines pointing to ancient version of the UIMA documentation. >> >> I have read a bit on rel="canonical" and robots.txt. >> >> 1) per webpage - Apparently, one can place a `link rel="canonical"` element >> on any HTML page. Search engines seeing this tag will then not index this >> page because it is considered to be a duplicate of whatever other page the >> link points to. >> >> 2) via http header/htaccess - Since we probably don't want to patch up all >> our JavaDoc files, the information about a canonical source can also be sent >> in the HTTP header, e.g. via a suitable htaccess file. >> >> I guess the idea would be that for any old documentation page, we would want >> it to point to its latest version as its canonical source. I mean for every >> page, not only for the index page. This seems a bit tedious. >> >> My suggestion would be an alternative that exploits the website folder >> structure and uses robots.txt. >> >> We disallow indexing of the "d" folder on the UIMA website. >> We place all the "*-current" folders (svn copies of the latest documentation >> versions) under a dedicated folder (e.g. "d/current") and allow indexing >> that. >> >> In that way, the outdated versions of the documentation should be hidden >> from the search engines and the respective latest versions should be indexed. >> >> Opinions? Does anybody have experience with SEO? >> >> Cheers, >> >> -- Richard >> >> >
