Yesterday I helped solving a performance problem, triggered by issuing requests with rows=2147483647 on an index with 3M documents.
In this concrete case the fix was easy, as it was possible to lower this to rows=10. But it had stumped the one asking for weeks - the typical amount of hits was 0 or 1, so he had assumed that the large number in rows did not have a performance impact. This got me thinking: What about adding a debug=sanity option to Solr requests? It could inspect the concrete request as well as the index layout and issue warnings where appropriate. Checks could be * rows > X * facet.limit > X * facet.limit=-1 and unique values in facet field > X * facet.method=enum and unique values in facet field > X * (filterCache_size * maxDoc/8) > (X * heap_size) * facet.field=A and A is a StrField without DocValues I am sure we can come up with more. My point is that some parts of trouble shooting Solr performance problems are easily definable and can be fully automated. Of course some of these will be false positives, but such is the nature of looking for warning signs. As this would be primarily for people not familiar with the inner working of Solr, some explanations would be needed: # Potential problem: rows=2147483647 # Explanation: Specifying a number larger than 10,000 for rows can lead # to high CPU load and slow response times, even if the number of hits # in the search result is low. # Technical: A high row count makes Solr allocate min(rows, maxDoc) # ScoreDoc Objects temporarily , which can trigger excessive garbage # collection. # Alternative: Use pagination (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results) - Toke Eskildsen, State and University Library, Denmark --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
