Hi Cameleers, I've posted this on Zulip but here for anyone not active there:
Some of you might be aware the current setup for search with Algolia using the DocSearch feature is going away and we have been given access to it's replacement using Algolia Crawler (also for free as an open source project). There was, what seemed like an automatic, conversion of the DocSearch configuration to Algolia Crawler configuration. When indexing using this new configuration there were a number of errors reported. These errors were caused by a large number of records indexed from larger pages we have on the website -- like components with a lot of options or examples. I've since reconfigured the Algolia Crawler and there are no errors now. This configuration is a mix of JSON and JavaScript that can be only seen on Algolia. If there is someone interested in contributing to this please let me know and I'll arrange access. There are many things with search that we can improve, not only in indexing but in overall experience and there are plenty of interesting opportunities for someone with interest in this. Currently only I have access to this. Right now we have ~18.3k records in the index, created from 2.6k pages with 6.39k pages ignored. Overwhelmingly the ignored pages are ignored due to their canonical URL pointing to the latest version (for example for the component reference that is 3.15.0 currently). This means that we don't have records for any non-latest versions in the index. I've added this as a comment on https://github.com/apache/camel-website/pull/724#issuecomment-1049268827 and I'm asking for feedback from folk on what path we should take. Please comment on the PR or here with your thoughts. zoran -- Zoran Regvart