Sanity checking in Solr

Toke Eskildsen Tue, 29 Sep 2015 01:01:19 -0700

Yesterday I helped solving a performance problem, triggered by issuing
requests with rows=2147483647 on an index with 3M documents.


In this concrete case the fix was easy, as it was possible to lower this
to rows=10. But it had stumped the one asking for weeks - the typical
amount of hits was 0 or 1, so he had assumed that the large number in
rows did not have a performance impact.


This got me thinking: What about adding a debug=sanity option to Solr
requests? It could inspect the concrete request as well as the index
layout and issue warnings where appropriate. Checks could be

* rows > X
* facet.limit > X
* facet.limit=-1 and unique values in facet field > X
* facet.method=enum and unique values in facet field > X
* (filterCache_size * maxDoc/8) > (X * heap_size)
* facet.field=A and A is a StrField without DocValues

I am sure we can come up with more. My point is that some parts of
trouble shooting Solr performance problems are easily definable and can
be fully automated. Of course some of these will be false positives, but
such is the nature of looking for warning signs.

As this would be primarily for people not familiar with the inner
working of Solr, some explanations would be needed:

# Potential problem: rows=2147483647
# Explanation: Specifying a number larger than 10,000 for rows can lead 
# to high CPU load and slow response times, even if the number of hits
# in the search result is low.
# Technical: A high row count makes Solr allocate min(rows, maxDoc) 
# ScoreDoc Objects temporarily , which can trigger excessive garbage
# collection.
# Alternative: Use pagination
(https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results)


- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Sanity checking in Solr

Reply via email to