zregvart commented on pull request #724: URL: https://github.com/apache/camel-website/pull/724#issuecomment-1049268827
As a reminder the deadline for migrating away from DocSearch is 15.3.2022, at which point the index will be served but DocSearch crawler will no longer update the index. This is progressing nicely, I have figured out how to configure the crawler and this allows us great flexibility in comparison to the DocSearch crawler we currently use. And the configuration in place now at Algolia Crawler now reflects that, e.g. there are separate record extraction configurations for different parts of the website. This will probably need more refinement feedback on the search performance is very welcome. When examining the search results please do make sure that you're still on the preview URL (https://pr-724--camel.netlify.app/) as it is easy to follow a link from the search result and land on the production website (https://camel.apache.org) where old index is still used. There is one known issue however: the canonical link for each page points to the latest released version, e.g. currently for component reference this is 3.15.0, and the crawler indexing pages from a different version ignores those. There is a good explanation of this behavior in the [Antora documentation](https://docs.antora.org/antora/latest/playbook/site-url/#how-the-canonical-url-works). That means that only the latest version of the versioned documentation is indexed. This is not how the DocSearch Crawler behaves, it seems to disregard the canonical link. For that we have (I think) several options: - accept that only the latest version is indexed - remove canonical links - manually crawl and [upload data](https://www.algolia.com/doc/guides/sending-and-managing-data/send-and-update-your-data/) to Algolia - (not sure this is an option) set canonical links to actual page URLs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
