gerlowskija commented on PR #123: URL: https://github.com/apache/solr-sandbox/pull/123#issuecomment-3313184287
At "Community over Code NA" last week I talked to INFRA about a place we can put the data-dumps so that curious users can avoid the (very!) time-consuming download and processing of wikipedia data. They suggested we use nightlies, and I've since been able to upload a Solr-ready dataset: https://nightlies.apache.org/solr/benchmark-data/wiki/solr-wiki-batches-5k-1k.tar.gz. This tgz contains ~5k Solr-ready JSON files, each containing a batch of 5k wikipedia articles truncated at 1k each. For folks that just want to run the indexing benchmark, this simplifies the steps down to: 1. Download pre-processed data: - `mkdir -p .gatling/batches && cd .gatling/batches && wget https://nightlies.apache.org/solr/benchmark-data/wiki/solr-wiki-batches-5k-1k.tar.gz && tar -xvf solr-wiki-batches-5k-1k.tar.gz` 2. Start a local Solr - any Solr can be used: local or remote, Docker or baremetal, release or SNAPSHOT, etc. Benchmarking will assume "http://localhost:8983/solr" unless told otherwise. 3. Install wiki configset to Solr - ./scripts/gatling/setup_wikipedia_tests.sh 4. Run benchmark - ./gradlew gatlingRun --simulation index.IndexWikipediaBatchesSimulation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
