Switching to Algolia Crawler for search

Zoran Regvart Thu, 24 Feb 2022 02:30:43 -0800

Hi Cameleers,
I've posted this on Zulip but here for anyone not active there:


Some of you might be aware the current setup for search with Algolia
using the DocSearch feature is going away and we have been given
access to it's replacement using Algolia Crawler (also for free as an
open source project). There was, what seemed like an automatic,
conversion of the DocSearch configuration to Algolia Crawler
configuration.

When indexing using this new configuration there were a number of
errors reported. These errors were caused by a large number of records
indexed from larger pages we have on the website -- like components
with a lot of options or examples. I've since reconfigured the Algolia
Crawler and there are no errors now.

This configuration is a mix of JSON and JavaScript that can be only
seen on Algolia. If there is someone interested in contributing to
this please let me know and I'll arrange access.

There are many things with search that we can improve, not only in
indexing but in overall experience and there are plenty of interesting
opportunities for someone with interest in this. Currently only I have
access to this.

Right now we have ~18.3k records in the index,  created from 2.6k
pages with 6.39k pages ignored. Overwhelmingly the ignored pages are
ignored due to their canonical URL pointing to the latest version (for
example for the component reference that is 3.15.0 currently). This
means that we don't have records for any non-latest versions in the
index. I've added this as a comment on
https://github.com/apache/camel-website/pull/724#issuecomment-1049268827
and I'm asking for feedback from folk on what path we should take.
Please comment on the PR or here with your thoughts.

zoran
-- 
Zoran Regvart

Switching to Algolia Crawler for search

Reply via email to