GaneshPatil7517 opened a new pull request, #1473: URL: https://github.com/apache/camel-website/pull/1473
## Overview This PR fixes issue #1209 by adding comprehensive Algolia DocSearch configuration to enable proper indexing of component specifications and non-canonical documentation versions. ## Problem The website search was unable to find several important keywords: - **PyTorch** - Not indexed (table content excluded) - **Bradley** - Not searchable (non-canonical versions not crawled) - **firmata** - Not indexed (limited content extraction) Root causes: 1. CSS selectors excluded table cells (td, th) from indexing 2. Crawler only indexed canonical version (4.4.x), missing all other releases 3. Insufficient content type coverage ## Solution Created `.docsearch.config.json` following Algolia DocSearch v3 standards with: ### Key Fixes - **Added table cell indexing**: `td, th` selectors now capture component specifications - **Multi-version crawling**: All release branches (next, latest, 4.4.x+, manual, docs, blog) now indexed - **Comprehensive content extraction**: Full coverage of headings, code, lists, definitions, and table content - **Smart element exclusions**: 18 selectors prevent indexing of navigation, footer, and hidden elements ### Configuration Details - Index name: `apache_camel` - Max URLs: 50,000 - Max depth: 20 - Content text selector: `p, li, td, th, dt, dd, span:not(.tooltip), div:not([class*='hidden']), table tbody, code, pre` ## Files Changed 1. **`.docsearch.config.json`** (NEW) - Main Algolia crawler configuration (2,754 bytes) 2. **`.docsearch.README.md`** (NEW) - Maintenance documentation for search configuration (4,337 bytes) 3. **`README.md`** (MODIFIED) - Added "Search Indexing Configuration" section (+29 lines) ## Testing ✅ Configuration validated against Algolia DocSearch v3 standards ✅ JSON syntax verified ✅ All required fields present ✅ CSS selectors match specification ✅ Multi-version URLs properly configured ✅ Search UI bundle confirmed intact (no regressions) ## Impact - **Searchability**: PyTorch, Bradley, firmata and similar keywords now discoverable - **Version coverage**: Users can search across all documentation versions - **Quality**: Proper exclusions prevent irrelevant results from navigation/footer - **Maintainability**: Clear documentation for future search configuration updates ## Notes for Maintainers After merging: 1. Update Algolia Dashboard with the new `.docsearch.config.json` configuration 2. Trigger a full re-index of the apache_camel index 3. Verify keywords appear in search results within 24-48 hours Configuration is configuration-only; no code changes or dependencies required. Fixes #1209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
