[ 
https://issues.apache.org/jira/browse/CAMEL-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152218#comment-17152218
 ] 

Aashna Jena commented on CAMEL-14952:
-------------------------------------

As I mentioned during the call today, I had prepared a config file with 
selectors similar to this one, but according to me, this poses maintenance 
issues. Versions, menu items, and blog categories are not fixed. We keep 
adding/changing categories, versions, and menu items, that's why I was against 
the idea of having to specify them explicitly in the config file, especially 
because it will mean that we have to put a PR on the Algolia repo every time we 
make a change. We have recently also merged two pages - docs and projects, and 
we might add/delete more webpages in the future, that's why I don't want to 
keep explicit start_urls as well - although this we might require to some 
extent.

My idea was to keep the config file that I've suggested on the PR + add one new 
selector for Blog where lvl0 = "Blog", lvl1 = header a.category, and lvl2=h1 
and add the categories page in stop_urls to stop duplicating results. This way, 
we won't need an explicit set of categories. I'll wait for some more opinions 
before I make changes, because [~Aemie] and I seem to be suggesting very 
different things.

> Better search on the website
> ----------------------------
>
>                 Key: CAMEL-14952
>                 URL: https://issues.apache.org/jira/browse/CAMEL-14952
>             Project: Camel
>          Issue Type: Improvement
>          Components: website
>            Reporter: Zoran Regvart
>            Priority: Major
>         Attachments: 
> BH4D9OD16A_apache_camel_20200608-20200614_no_result_searches.csv, 
> List_Of_Crawled_Pages_by_DocSearch.txt, apache_demo.json, camel.json, 
> getbootstrap-searchresult.png, image-2020-06-13-14-39-08-776.png, 
> list_of_crawled_pages.txt, sitemap-camel.png
>
>
> We use [Algolia|http://algolia.com/] for the search functionality on the 
> website using their [free plan|https://www.algolia.com/for-open-source/] for 
> Open Source projects. The index is built by Algolia's crawler using the 
> [DocSearch|https://docsearch.algolia.com/].
> When this was done we built our own UI on top of Algolia JavaScript API, as 
> one if requirements is that clients use Algolia's JavaScript clients. We did 
> not use Algolia UI as at that point it was rather large JavaScript dependency 
> to add and it would slow down the loading of the website.
> We also have some [initial 
> work|https://github.com/apache/camel-website/pull/74] on creating our own 
> Algolia index at build time.
> The current search doesn't seem to index the whole website, some results 
> don't appear in the search, looks like most of the content from Antora is not 
> indexed: trying to search for {{removeHeader}}, the [FAQ 
> entry|https://camel.apache.org/manual/latest/faq/how-to-avoid-sending-some-or-all-message-headers.html]
>  is not found. There's also a list of failed searches on the Algolia 
> dashboard we can use to benchmark the search.
> What we need is to build the search index over the whole content. Approach 
> taken in [#74|https://github.com/apache/camel-website/pull/74] is good start 
> for Hugo generated content. We need to expand that to Antora built content as 
> well.
> This search index would be built at the website build time and would include 
> both Hugo and Algolia content in the same file or possibly in several files 
> if we use multi-index search. More on how indexes are built can be seen in 
> the [Algolia 
> documentation|https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/].
> We need to figure out what data to send and how to integrate this with 
> Antora, for Hugo we have a good idea from 
> [#74|https://github.com/apache/camel-website/pull/74], importantly the 
> structure needs to be the same. One good source of inspiration on building 
> the index for Antora content is in the [Lunr.js 
> integration|https://github.com/Mogztter/antora-lunr].
> We need to build the index with the search UI in mind, i.e. the index needs 
> to contain the data we wish to present in the UI as well as enough content 
> for Algolia to be able to use the content to perform search. So starting with 
> a mockup of what we wish to present/utilize in the search UI and deriving the 
> data structure for the index from that would be a good start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to