gerlowskija commented on PR #123:
URL: https://github.com/apache/solr-sandbox/pull/123#issuecomment-3313184287

   At "Community over Code NA" last week I talked to INFRA about a place we can 
put the data-dumps so that curious users can avoid the (very!) time-consuming 
download and processing of wikipedia data.  They suggested we use nightlies, 
and I've since been able to upload a Solr-ready dataset: 
https://nightlies.apache.org/solr/benchmark-data/wiki/solr-wiki-batches-5k-1k.tar.gz.
  This tgz contains ~5k Solr-ready JSON files, each containing a batch of 5k 
wikipedia articles truncated at 1k each.
   
   For folks that just want to run the indexing benchmark, this simplifies the 
steps down to:
   
   1. Download pre-processed data:
       - `mkdir -p .gatling/batches && cd .gatling/batches && wget 
https://nightlies.apache.org/solr/benchmark-data/wiki/solr-wiki-batches-5k-1k.tar.gz
 && tar -xvf solr-wiki-batches-5k-1k.tar.gz`
   2. Start a local Solr - any Solr can be used: local or remote, Docker or 
baremetal, release or SNAPSHOT, etc. Benchmarking will assume 
"http://localhost:8983/solr"; unless told otherwise.
   3. Install wiki configset to Solr
       - ./scripts/gatling/setup_wikipedia_tests.sh
   4. Run benchmark
       - ./gradlew gatlingRun  --simulation 
index.IndexWikipediaBatchesSimulation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to