Re: Sanity checking in Solr

Upayavira Tue, 29 Sep 2015 05:31:43 -0700

Could it be added to the debug component? That seems like a natural
place for it. It could, as you say, look for standard things that
might make a query perform badly, and report them in a new <sanity>
element, or such.


Upayavira

On Tue, Sep 29, 2015, at 01:24 PM, Mikhail Khludnev wrote:
> Hi Toke! What a cool idea!
>
> On Tue, Sep 29, 2015 at 11:00 AM, Toke Eskildsen
> <t...@statsbiblioteket.dk> wrote:
>> Yesterday I helped solving a performance problem, triggered by
>> issuing
>>
requests with rows=2147483647 on an index with 3M documents.
>>
>>
In this concrete case the fix was easy, as it was possible to lower this
>>
to rows=10. But it had stumped the one asking for weeks - the typical
>>
amount of hits was 0 or 1, so he had assumed that the large number in
>>
rows did not have a performance impact.
>>
>>
>>
This got me thinking: What about adding a debug=sanity option to Solr
>>
requests? It could inspect the concrete request as well as the index
>>
layout and issue warnings where appropriate. Checks could be
>>
>>
* rows > X
>>
* facet.limit > X
>>
* facet.limit=-1 and unique values in facet field > X
>>
* facet.method=enum and unique values in facet field > X
>>
* (filterCache_size * maxDoc/8) > (X * heap_size)
>>
* facet.field=A and A is a StrField without DocValues
>>
>>
I am sure we can come up with more. My point is that some parts of
>>
trouble shooting Solr performance problems are easily definable and can
>>
be fully automated. Of course some of these will be false positives, but
>>
such is the nature of looking for warning signs.
>>
>>
As this would be primarily for people not familiar with the inner
>>
working of Solr, some explanations would be needed:
>>
>>
# Potential problem: rows=2147483647
>>
# Explanation: Specifying a number larger than 10,000 for rows can lead
>>
# to high CPU load and slow response times, even if the number of hits
>>
# in the search result is low.
>>
# Technical: A high row count makes Solr allocate min(rows, maxDoc)
>>
# ScoreDoc Objects temporarily , which can trigger excessive garbage
>>
# collection.
>>
# Alternative: Use pagination
>>
(https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results)
>>
>>
>>
- Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>>
---------------------------------------------------------------------
>>
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>
For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
>
> --
> Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics
>
>

Re: Sanity checking in Solr

Reply via email to