[
https://issues.apache.org/jira/browse/ARROW-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407342#comment-17407342
]
Joris Van den Bossche edited comment on ARROW-13260 at 8/31/21, 1:29 PM:
-------------------------------------------------------------------------
I have been researching the canonical url option a bit (or in general ways to
avoid issues with duplicated content).
So Google mentions using a {{<link rel="canonical" href="..." />}} link tag for
this. It seems that this is actually supported by sphinx when using the
{{html_baseurl}} configuration option. We don't use that yet, but adding it to
{{conf.py}} and setting it to "https://arrow.apache.org/docs/" adds proper
canonical links tags.
So we could start using this now, and then for the already released versions of
the docs add this manually (I think writing a small script for this should be
doable).
But, one possible drawback with canonical urls is that those can get out of
date. For example, if a page is removed or renamed, the older already existing
versions of the docs will still point to that url, so effectively pointing to a
non-existing page. I don't know whether this is actually a problem, though?
According to
https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls,
adding a sitemap is also an alternative way to give Google a hint which page
to use in search results. There is a sphinx extension that can generate this
(https://github.com/jdillard/sphinx-sitemap, didn't yet check it out). So this
could be an easier option instead of the canonical urls, in case those
canonical urls need to be kept updated in older docs (the sitemap would only
list the pages for the stable release, and thus is easy to keep up to date by
regenerating it with sphinx when adding the built docs with a new release).
(I am no web expert, though, so not sure what option is best)
was (Author: jorisvandenbossche):
I have been researching the canonical url option a bit (or in general ways to
avoid issues with duplicated content).
So Google mentions using a {{<link rel="canonical" href="..." />}} link tag for
this. It seems that this is actually supported by sphinx when using the
{{html_baseurl}} configuration option. We don't use that yet, but adding it to
{{conf.py}} and setting it to "https://arrow.apache.org/docs/" adds proper
canonical links tags.
So we could start using this now, and then for the already released versions of
the docs add this manually (I think writing a small script for this should be
doable).
But, one possible drawback with canonical urls is that those can get out of
date. For example, if a page is removed or renamed, the older already existing
versions of the docs will still point to that url, so effectively pointing to a
non-existing page. I don't know whether this is actually a problem, though?
Acording to
https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls,
adding a sitemap is also an alternative way to give Google a hint which page
to use in search results. There is a sphinx extension that can generate this
(https://github.com/jdillard/sphinx-sitemap, didn't yet check it out). So this
could be an easier option instead of the canonical urls, in case those
canonical urls need to be kept updated in older docs (the sitemap would only
list the pages for the stable release, and thus is easy to keep up to date by
regenerating it with sphinx when adding the built docs with a new release).
(I am no web expert, though, so not sure what option is best)
> [Doc] Host different released versions of the documentation + version switcher
> ------------------------------------------------------------------------------
>
> Key: ARROW-13260
> URL: https://issues.apache.org/jira/browse/ARROW-13260
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Documentation
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Fix For: 6.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Related to ARROW-1299 for hosting the nightly docs, we could also keep the
> built docs for older versions instead of overwriting them at each release.
> Currently, the built docs live in the apache/arrow-site repo (asf-site
> branch) in the "/docs/" subdirectory.
> When releasing, we could add the newly built docs for that release in a
> subdirectory instead. Something like "/docs/5.0/" or "/docs/version/5.0".
> (And we could retroactively add some docs of previous releases, if we want)
> To make this useful for the user, we need a version switcher in the sphinx
> theme layout. There are other projects that use the same sphinx theme that
> have added this (see eg https://mne.tools/stable/index.html), and there is
> some work to upstream this to the base theme (but on the short term we could
> also copy such a custom implementation).
> For the "stable" docs (latest release), we could either 1) keep a duplicated
> version of the latest built docs at "/docs/", or 2) symlink "/docs/" to
> "/docs/5.0/" (and update this for each release; although I am not sure this
> is possible since it's the other is a child directory).
> We could also add a "/docs/stable/" and make this the default url.
> cc [~amol-] [~kszucs]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)