Hi Tomás,

I suspect this might be related to LUCENE-5938. We changed the default
rewrite method for multi-term queries to load documents into a sparse
bit set first first, and only upgrade to a dense bit set when we know
many documents match. When there are lots of terms to intersect, then
we end up spending significant cpu time to build the sparse bit set to
eventually upgrade to a dense bit set like before. This might be what
you are seeing.

You might see the issue less with the population field because it has
fewer unique values, so postings lists are longer and the DocIdSet
building logic can upgrade quicker to a dense bit set.

Mike noticed this slowness when working on BDK trees and we changed
this first phase to use a plain int[] array that we sort and
deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
which seemed to make things faster. Would it be possible for you to
also check a 5.3 snapshot?




On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
<[email protected]> wrote:
> Hi, I'm seeing some query performance degradation between 4.10.4 and 5.2.1.
> It doesn't happen with all the queries, but for queries like range queries
> on fields with many different values the average time in 5.2.1 is worse than
> in 4.10.4. Is anyone seeing something similar?
>
> Test Details:
> * Single thread running queries continuously. I run the test twice for each
> Solr version.
> * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge instance
> with all the defaults but with 5G heap. Index in local disk (SSD)
> * Plain Solr releases, nothing custom. Single Solr core, not in SolrCloud
> mode, no distributed search.
> * "allCountries" geonames dataset (~8M small docs). No updates during the
> test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr 4.10.4
> (fits entirely in RAM)
> * jdk1.8.0_45
>
> Queries: 3k boolean queries (generated with terms from the dataset) with
> range queries as filters on "tlongitude" and "tlatitude" fields with
> randomly generated bounds, e.g.
> q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>
> Fields are:
> <field name="tlatitude" type="tdouble"/>
> <field name="tlongitude" type="tdouble"/>
> Field Type:
> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8"
> positionIncrementGap="0"/>
>
> In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
> average.
>
> http://snag.gy/2yPPM.jpg
>
> Doing only the boolean queries show no performance difference between 4.10
> and 5.2, same thing if I do filters on a string field instead of the range
> queries.
>
> When using "double" field type (precisionStep="0"), the difference was
> bigger:
>
> longitude/latitude fields:
> <field name="longitude" type="double" docValues="true"/>
> <field name="latitude" type="double" docValues="true"/>
> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
> positionIncrementGap="0"/>
>
> http://snag.gy/Vi5uk.jpg
> I understand this is not the best field type definition for range queries,
> I'm just trying to understand the difference between the two versions and
> why.
>
> Performance was OK when doing range queries on the "population" field
> (long), but that field doesn't have many different values, only 300k out of
> the 8.3M docs have the population field with a value different to 0. On the
> other hand, doing range queries on the _version_ field did show a graph
> similar to the previous one:
>
> <field name="_version_" type="long" indexed="true" stored="true"/>
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> positionIncrementGap="0"/>
>
> http://snag.gy/4tc7e.jpg
>
> Any idea what could be causing this? Is this expected after some known
> change?
>
> With Solr 4.10, a single CPU core remains high during the test (close to
> 100%), but with Solr 5.2, different cores go up and down in utilization
> continuously. That's probably because of the different Jetty version I
> suppose.
> GC pattern looks similar in both. For both Solr versions I'm using the
> settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to