kaxil opened a new pull request, #59658: URL: https://github.com/apache/airflow/pull/59658
Preview: https://airflow.staged.apache.org/docs/apache-airflow/3.1.5/ I have been frustrated by Sphinx search for a long-long time. So after adding [dark-mode](https://github.com/apache/airflow-site/pull/1331) (and other related changes in https://github.com/apache/airflow-site), this was next in my list! This PR/commit introduces a fast, fully client-side search experience for the Apache Airflow documentation, powered by [Pagefind](https://pagefind.app/). The new search is keyboard-accessible (Cmd+K / Ctrl+K), works offline, and requires no external services. Search indexes are generated automatically at documentation build time and loaded entirely in the browser, enabling sub-50 ms queries even on large docs. I have kept the Sphinx search too as a backup and it will keep functioning. <img width="2882" height="1656" alt="image" src="https://github.com/user-attachments/assets/3da4a887-e7ee-4e9c-80ec-42d00b0058a1" /> <img width="1573" height="835" alt="image" src="https://github.com/user-attachments/assets/3eea7aed-5f50-4ff3-8ac7-8e213b9d8f70" /> ## What’s included New Sphinx extension: `pagefind_search` Located in `devel-common/src/sphinx_exts/pagefind_search/`: - __init__.py: Extension setup with configuration values and event handlers - builder.py: Automatic index building with graceful fallback - static/css/pagefind.css: Search modal and button styling with dark mode support - static/js/search.js: Search functionality with keyboard shortcuts - templates/search-modal.html: Search modal HTML template ### Features - Keyboard shortcut (Cmd+K/Ctrl+K) opens search modal - Arrow key navigation through results - Works offline (no external services) - Automatic indexing during documentation build - Dark mode support - Sub-50ms search performance - Configurable content indexing via conf.py ### User Experience Users can now: - Press Cmd+K from any documentation page to search - Navigate results with arrow keys, Enter to select, Esc to close - Search works immediately without network requests - Results show page title, breadcrumb, and excerpt ### Configuration Available in `conf.py`: - pagefind_enabled: Toggle search indexing - pagefind_verbose: Enable build logging - pagefind_root_selector: Define searchable content area - pagefind_exclude_selectors: Exclude navigation, headers, footers - pagefind_custom_records: Index non-HTML content (PDFs, etc.) ### Ranking Optimization I have also spent a lot of time tuning the search the below knows.. Now, the extension uses optimized ranking parameters in `search.js` which in my testing has produced better results: - **termFrequency: 1.0** - Standard term occurrence weighting - **termSaturation: 0.7** - Moderate saturation to prevent over-rewarding repetition - **termSimilarity: 7.5** - Maximum boost for exact phrase matches and similar terms - **pageLength: 0** - No penalty for longer pages (important for reference documentation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
