[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176507#comment-17176507
 ] 

Ishan Chattopadhyaya commented on SOLR-10317:
---------------------------------------------

bq. Ishan ChattopadhyayaDavid SmileyI'm using the Icecat dataset 
(https://github.com/querqy/chorus#sample-data-details) for a project. Today we 
have lots of attributed products (over 8000 attributes across the roughly 100K) 
products. We're working on adding actual price data to this dataset, which is 
currently doesn't have, and then I'm mulling over some ideas to generate 
reasonable queries (and judgement lists) to use with this dataset at scale.

[~epugh], I'll give it a try. Thanks for the tip!

> Solr Nightly Benchmarks
> -----------------------
>
>                 Key: SOLR-10317
>                 URL: https://issues.apache.org/jira/browse/SOLR-10317
>             Project: Solr
>          Issue Type: Task
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Ishan Chattopadhyaya
>            Priority: Major
>              Labels: gsoc2017, mentor
>         Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> The benchmarking suite is now here: 
> [https://github.com/thesearchstack/solr-bench]
> Actual datasets and queries are TBD yet.
>  
> --- Original description ---
>  Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, [https://home.apache.org/~mikemccand/lucenebench/].
>  
>  Preferably, we need:
>  # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
>  # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
>  # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
>  
>  There is some prior work / discussion:
>  # [https://github.com/shalinmangar/solr-perf-tools] (Shalin)
>  # [https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md] 
> (Ishan/Vivek)
>  # SOLR-2646 & SOLR-9863 (Mark Miller)
>  # [https://home.apache.org/~mikemccand/lucenebench/] (Mike McCandless)
>  # [https://github.com/lucidworks/solr-scale-tk] (Tim Potter)
>  
>  There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
>  
>  Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~[markrmil...@gmail.com|mailto:markrmil...@gmail.com]] 
> would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to