Could it be added to the debug component? That seems like a natural place for it. It could, as you say, look for standard things that might make a query perform badly, and report them in a new <sanity> element, or such.
Upayavira On Tue, Sep 29, 2015, at 01:24 PM, Mikhail Khludnev wrote: > Hi Toke! What a cool idea! > > On Tue, Sep 29, 2015 at 11:00 AM, Toke Eskildsen > <t...@statsbiblioteket.dk> wrote: >> Yesterday I helped solving a performance problem, triggered by >> issuing >> requests with rows=2147483647 on an index with 3M documents. >> >> In this concrete case the fix was easy, as it was possible to lower this >> to rows=10. But it had stumped the one asking for weeks - the typical >> amount of hits was 0 or 1, so he had assumed that the large number in >> rows did not have a performance impact. >> >> >> This got me thinking: What about adding a debug=sanity option to Solr >> requests? It could inspect the concrete request as well as the index >> layout and issue warnings where appropriate. Checks could be >> >> * rows > X >> * facet.limit > X >> * facet.limit=-1 and unique values in facet field > X >> * facet.method=enum and unique values in facet field > X >> * (filterCache_size * maxDoc/8) > (X * heap_size) >> * facet.field=A and A is a StrField without DocValues >> >> I am sure we can come up with more. My point is that some parts of >> trouble shooting Solr performance problems are easily definable and can >> be fully automated. Of course some of these will be false positives, but >> such is the nature of looking for warning signs. >> >> As this would be primarily for people not familiar with the inner >> working of Solr, some explanations would be needed: >> >> # Potential problem: rows=2147483647 >> # Explanation: Specifying a number larger than 10,000 for rows can lead >> # to high CPU load and slow response times, even if the number of hits >> # in the search result is low. >> # Technical: A high row count makes Solr allocate min(rows, maxDoc) >> # ScoreDoc Objects temporarily , which can trigger excessive garbage >> # collection. >> # Alternative: Use pagination >> (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results) >> >> >> - Toke Eskildsen, State and University Library, Denmark >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > > -- > Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics > >