Hi, I'm seeing some query performance degradation between 4.10.4 and 5.2.1.
It doesn't happen with all the queries, but for queries like range queries
on fields with many different values the average time in 5.2.1 is worse
than in 4.10.4. Is anyone seeing something similar?

Test Details:
* Single thread running queries continuously. I run the test twice for each
Solr version.
* JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
instance with all the defaults but with 5G heap. Index in local disk (SSD)
* Plain Solr releases, nothing custom. Single Solr core, not in SolrCloud
mode, no distributed search.
* "allCountries" geonames dataset (~8M small docs). No updates during the
test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr 4.10.4
(fits entirely in RAM)
* jdk1.8.0_45

Queries: 3k boolean queries (generated with terms from the dataset) with
range queries as filters on "tlongitude" and "tlatitude" fields with
randomly generated bounds, e.g.
q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]

Fields are:
<field name="tlatitude" type="tdouble"/>
<field name="tlongitude" type="tdouble"/>
Field Type:
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8"
positionIncrementGap="0"/>

In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1 in
average.

http://snag.gy/2yPPM.jpg

Doing only the boolean queries show no performance difference between 4.10
and 5.2, same thing if I do filters on a string field instead of the range
queries.

When using "double" field type (precisionStep="0"), the difference was
bigger:

longitude/latitude fields:
<field name="longitude" type="double" docValues="true"/>
<field name="latitude" type="double" docValues="true"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
positionIncrementGap="0"/>

http://snag.gy/Vi5uk.jpg
I understand this is not the best field type definition for range queries,
I'm just trying to understand the difference between the two versions and
why.

Performance was OK when doing range queries on the "population" field
(long), but that field doesn't have many different values, only 300k out of
the 8.3M docs have the population field with a value different to 0. On the
other hand, doing range queries on the _version_ field did show a graph
similar to the previous one:

<field name="_version_" type="long" indexed="true" stored="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0"
positionIncrementGap="0"/>

http://snag.gy/4tc7e.jpg

Any idea what could be causing this? Is this expected after some known
change?

With Solr 4.10, a single CPU core remains high during the test (close to
100%), but with Solr 5.2, different cores go up and down in utilization
continuously. That's probably because of the different Jetty version I
suppose.
GC pattern looks similar in both. For both Solr versions I'm using the
settings that ship with Solr (in solr.in.sh) except for Xmx and Xms

Reply via email to