janhoy opened a new pull request, #3:
URL: https://github.com/apache/solr-orbit/pull/3

   https://issues.apache.org/jira/browse/SOLR-18255
   
   This PR contains the initial port of [OpenSearch Benchmark 
(OSB)](https://github.com/opensearch-project/opensearch-benchmark) to work with 
Apache Solr. The fork point from OSB is tagged `osb_fork_point` (OSB commit 
`92982c56`).
   
   The codebase retains the OSB Python package name (`osbenchmark`) and 
directory structure for now; known work to do is tracked in `TODO.md` and will 
likely be converted into JIRA tasks.
   
   ## How to review
   
   The PR is structured as **6 commits in logical progression order**. Each 
commit is independently coherent and reviewable in isolation. The recommended 
approach is to review one commit at a time using GitHub's commit view or `git 
log -p`. The final commit is the largest, but by that point the project shape 
is established and the changes read more clearly in context.
   
   | # | Commit | Files | What to focus on |
   |---|--------|-------|-----------------|
   | 1 | Establish ASF legal and governance files | 12 | NOTICE attribution, 
license header format, CONTRIBUTING accuracy |
   | 2 | Update GitHub/CI infrastructure | 20 | Workflow correctness, removed 
vs. kept actions |
   | 3 | Rewrite documentation | 84 | Install steps, CLI examples, converter 
docs accuracy |
   | 4 | Remove OSB-specific dead code and binaries | 41 | Verify nothing 
Solr-relevant was swept up |
   | 5 | Add new Solr-specific modules | 25 | Conversion logic (schema.py, 
query.py), provisioner correctness |
   | 6 | Port core benchmark framework | 195 | client.py, telemetry.py, 
runner.py — see functional notes below |
   
   ## Summary of major changes
   
   ### 1. Solr-native client (`osbenchmark/client.py`)
   The OpenSearch Python client (`opensearch-py`) has been replaced with a 
purpose-built `SolrAdminClient` and `SolrClient` that communicate with Solr 
over HTTP using `requests`/`pysolr`. All collection management, document 
indexing, and query execution now goes through Solr's REST API (Collections 
API, `/select`, `/update`, etc.).
   
   ### 2. Solr provisioner (`osbenchmark/builder/solr_provisioner.py`)
   A new `SolrProvisioner` replaces the OpenSearch node provisioning machinery. 
It supports three deployment modes:
   - **`from-distribution`** — downloads a released Solr binary from 
`downloads.apache.org` or the ASF archive (including pre-9.0 paths).
   - **`from-sources`** — builds Solr from a local checkout with Gradle.
   - **`docker`** — pulls and starts the official Solr Docker image, including 
nightly builds.
   
   `SolrDockerLauncher` handles container lifecycle. Version-aware logic 
handles the API differences between Solr 9.x and 10.x (e.g. collection creation 
flags).
   
   ### 3. Solr-specific telemetry devices (`osbenchmark/telemetry.py`)
   Six new `SolrTelemetryDevice` subclasses collect Solr-specific metrics 
during a run: `SolrJvmStats`, `SolrNodeStats`, `SolrCollectionStats`, 
`SolrQueryStats`, `SolrIndexingStats`, `SolrCacheStats`. These poll the Solr 
Metrics API and write results via the existing `ResultWriter` pipeline.
   
   ### 4. Solr runner operations (`osbenchmark/worker_coordinator/runner.py`)
   56 OpenSearch-specific runner classes have been removed (KNN, ML connectors, 
vector datasets, data streams, index templates, pipelines, etc.). In their 
place, Solr-specific runners have been added under `SolrRunner`: 
`SolrBulkIndex`, `SolrSearch`, `SolrPaginatedSearch`, `SolrCommit`, 
`SolrOptimize`, `SolrWaitForMerges`, `SolrCreateCollection`, 
`SolrDeleteCollection`.
   
   ### 5. Workload model: index → collection (`osbenchmark/workload/`)
   The workload domain model has been updated throughout:
   - `Index` / `DataStream` / `IndexTemplate` → `Collection`
   - `IndexTemplate`, `ComponentTemplate`, `DataStream` and 
serverless/vector-related types removed
   - New `CreateCollectionParamSource` / `DeleteCollectionParamSource` / 
`SolrSearchParamSource`
   - OpenSearch Query DSL validation removed; Solr query params used instead
   
   ### 6. OSB-to-Solr workload converter (`osbenchmark/conversion/`)
   A new converter pipeline (`workload_converter.py`, `detector.py`, 
`query.py`, `schema.py`, `field.py`) translates an OpenSearch Benchmark 
workload into Solr format:
   - Detects OSB-specific operations and query DSL automatically
   - Translates `bulk` → `bulk-index`, `force-merge` → `optimize`, index 
mappings → Solr configsets
   - Generates a minimal `solrconfig.xml` / `managed-schema.xml` configset 
skeleton
   - Invoked via `solr-benchmark convert-workload`; see `docs/converter/` for 
details
   
   ### 7. Metrics store simplified (`osbenchmark/metrics.py`)
   `OsMetricsStore`, `OsTestRunStore`, `OsResultsStore`, and 
`IndexTemplateProvider` (all backed by OpenSearch) have been removed. The 
single supported store is now `FilesystemMetricsStore` (JSON + CSV + SQLite on 
local disk), accessed via `LocalFilesystemResultWriter`.
   
   ### 8. Documentation site (`docs/`)
   A full user-facing documentation site is included, built with Jekyll + 
just-the-docs. Key sections: `user-guide/` (install, configure, workload 
authoring), `reference/` (telemetry, metrics, workload schema, commands), 
`converter/` (OSB migration guide), `cluster-config/`. Deployed to GitHub Pages 
via `.github/workflows/docs.yml`. See `docs/README.md` for local build 
instructions.
   
   ### 9. ASF licence headers and housekeeping
   - All modified files carry a two-line ASF modification notice above the 
original OpenSearch header.
   - OSB-specific GitHub workflows (release, backport, integ-test, PyPI 
publish) removed; a docs deploy workflow added.
   - Bundled `pbzip2` binaries removed; `pbzip2` is now an optional system 
prerequisite.
   - `CONTRIBUTING.md`, `DEVELOPER_GUIDE.md`, `README.md` rewritten for the 
Solr/ASF context.
   - `TODO.md` tracks remaining incubation steps (package rename, CI, release 
process, etc.).
   
   ---
   
   The changes are described by the **9 functional areas** above regardless of 
which commit they land in. The 6-commit structure exists purely to aid review — 
it does not reflect the order in which the work was done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to