[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947470#comment-15947470
 ] 

Ishan Chattopadhyaya commented on SOLR-10317:
---------------------------------------------

Here's a rough list of the top of my head. It would be good for a student to 
add to this list whatever I've missed out for the sake of completeness:
# Indexing benchmarks
## Standalone
## SolrCloud (various simple configurations (0) )
## new replication mode
# Various types of queries:
## Querying on numeric fields (exact queries, range queries)
## Querying on text fields
## Querying on string fields
## Sorting on numeric fields, string fields (with and without docValues)
## Extended Dismax queries
## Spatial search (using various strategies)
# Query (all the above) on
## Standalone Solr
## SolrCloud (on some simple configurations (0) )
## Also, good if this can be tried out on the new replication mode (SOLR-9835).
# Partial Updates benchmarks (atomic updates, in-place updates)
# Faceting (string fields, numeric fields, enum fields)
# Grouping (string fields, numeric fields, enum fields)
# Spell check

A Wikipedia based dataset is usually available on all the Jenkins instances, 
and could be used for the purpose. [~steve_rowe], [~thetaphi], can you please 
point to the downloadable link for the enwiki.random.lines.txt file? (I have 
it, but forgot where I got it from).

If I've missed out something, please feel free to comment.

(0) - Some simple SolrCloud configurations could be:
# 1 shard, 2-3 replicas
# 2 shards, 1 replica each
# 2 shards, 2 replicas each

> Solr Nightly Benchmarks
> -----------------------
>
>                 Key: SOLR-10317
>                 URL: https://issues.apache.org/jira/browse/SOLR-10317
>             Project: Solr
>          Issue Type: Task
>            Reporter: Ishan Chattopadhyaya
>              Labels: gsoc2017, mentor
>
> Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, https://home.apache.org/~mikemccand/lucenebench/.
> Preferably, we need:
> # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
> # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
> # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
> There is some prior work / discussion:
> # https://github.com/shalinmangar/solr-perf-tools (Shalin)
> # https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md 
> (Ishan/Vivek)
> # SOLR-2646 & SOLR-9863 (Mark Miller)
> # https://home.apache.org/~mikemccand/lucenebench/ (Mike McCandless)
> # https://github.com/lucidworks/solr-scale-tk (Tim Potter)
> There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
> Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [[email protected]] would help here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to