Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Tomás Fernández Löbbe Wed, 05 Aug 2015 10:25:48 -0700

These are the tests I re-ran:
1) Boolean queries with filters (range queries) on a tdouble field
(precisionStep=8). Solr 4.10.4 and 5.2.1 are the official releases.
5.3.0-Exitable is a snapshot of 5.x branch, 5.3.0-Not-Exitable is the same
snapshot but I don't wrap the reader in ExitableDirectoryReader[1]
http://snag.gy/CClfF.jpg


2) The same queries, but the filters are made with "double" fields instead
(precisionStep=0):
http://snag.gy/jKsrV.jpg

I created https://issues.apache.org/jira/browse/SOLR-7875 and
https://issues.apache.org/jira/browse/SOLR-7876

[1] The code I changed in SolrIndexSearcher for the test:

private static DirectoryReader wrapReader(SolrCore core, DirectoryReader
reader) throws IOException {
    assert reader != null;
    DirectoryReader wrapped = UninvertingReader.wrap(reader,
core.getLatestSchema().getUninversionMap(reader));
    boolean disableExitableDirectory =
Boolean.getBoolean("solr.disableQueryTimeout");
    if (disableExitableDirectory) {
      return wrapped;
    }

    return ExitableDirectoryReader.wrap(wrapped,
SolrQueryTimeoutImpl.getInstance());
  }


On Tue, Aug 4, 2015 at 10:13 AM, Tomás Fernández Löbbe <
[email protected]> wrote:

> The stack traces I shared are also from the EC2 instances, using Amazon
> Linux. I can also include a test in Mac OS if you are interested in the
> results.
> These are some more tests I ran using 5.3 (r1693977) "Exitable" is with
> the ExitableDirectoryReader wrap, "Not-Exitable" is skips the wrapping of
> the reader completely in SolrIndexSearcher.
>
> Boolean queries with filters on the tdouble fields "tlongitude" and
> "tlatitude":
> http://snag.gy/1zjUy.jpg
> The difference between the two is very small. 5.3 looks better in
> performance than 5.2.1 (probably the fix that Adrian previously mentioned),
> but I ran the test from a different network/location, so the network
> latency may be different. I'll run the test against 5.2.1 and 4.10.4 again
> and update those results. Here is that preliminary graph:
> http://snag.gy/jR43Z.jpg
>
> The "ExitableDirectory" vs. "Not ExitableDirectory" did have a huge
> difference when running the range queries with the double field
> (precisionStep=0):
> http://snag.gy/frW8f.jpg
>
>
>
> On Tue, Aug 4, 2015 at 1:43 AM, Uwe Schindler <[email protected]> wrote:
>
>> Hi,
>>
>> I just asked, if maybe the stack dumps were from his MacOSX local
>> development machine. Otherwise the time difference between 4.10 and 5.2 on
>> MacOSX should be much larger - just as a cross check!
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [email protected]
>>
>> > -----Original Message-----
>> > From: Adrien Grand [mailto:[email protected]]
>> > Sent: Tuesday, August 04, 2015 10:35 AM
>> > To: [email protected]
>> > Subject: Re: Understanding performance degradation in range queries
>> > between Solr 5.2.1 and 4.10.4
>> >
>> > Tomás said Solr was running on EC2 in his initial email, so this can't
>> be
>> > MacOSX.
>> >
>> > On Tue, Aug 4, 2015 at 10:21 AM, Uwe Schindler <[email protected]> wrote:
>> > > Hi,
>> > >
>> > > Just related: Do you run tests on MacOSX? NanoTime is damn slow there!
>> > > And its also not monotonic! There wss a discussion on hotspot-dev
>> > > about that one year ago, but no solution until now.
>> > >
>> > > Macos had no real nanotime impl in kernel so java falls back to wall
>> > > clock, which is slow and non monotonic with ntp or vmware... (see test
>> > > failures in the past on policeman when ntp adjusted clock).
>> > >
>> > > If you run on Linux this could also be a good test: on osx the
>> > > difference to prev version should be even larger.
>> > >
>> > > Uwe
>> > >
>> > >
>> > > Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta
>> > > <[email protected]>:
>> > >>
>> > >> Guess bypassing that check and always returning true instead would
>> > >> help confirm that. If that's the case, we should just initialize the
>> > >> timeout to Long.MAX_VALUE and check for that to short-circuit?
>> > >>
>> > >> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
>> > >> <[email protected]> wrote:
>> > >>>
>> > >>> Yes, I saw that, but thought it could be the underlying
>> > >>> implementation, not the "ExitableTermsEnum" wrapper. Maybe it's
>> > >>> related to the calls to System.nanoTime then...
>> > >>>
>> > >>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <[email protected]>
>> > wrote:
>> > >>>>
>> > >>>> Thanks for sharing the traces, it looks like my intuition was
>> wrong.
>> > >>>> :) They seem to point to
>> > >>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks
>> > >>>> whether the time is out before delegating.
>> > >>>>
>> > >>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>> > >>>> <[email protected]> wrote:
>> > >>>> > Thanks Adrien,
>> > >>>> > I'll run the tests with 5.3 snapshot and post the results here.
>> > >>>> > In case this helps, this is the hprof samples output
>> > >>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-
>> > user/hprof_output.
>> > >>>> > txt)
>> > >>>> > for
>> > >>>> > 4.10.4 and 5.2.1 in my test:
>> > >>>> >
>> > >>>> > Solr 4.10.4:
>> > >>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>> > >>>> > rank   self  accum   count trace method
>> > >>>> >    1 75.07% 75.07%  182812 300523
>> > >>>> > java.net.PlainSocketImpl.socketAccept
>> > >>>> >    2  4.27% 79.34%   10408 301576
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMet
>> > aData
>> > >>>> >    3  4.15% 83.49%   10108 301585
>> > >>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>> > >>>> >    4  3.46% 86.95%    8419 301582
>> > >>>> > org.apache.lucene.index.FilteredTermsEnum.next
>> > >>>> >    5  2.49% 89.44%    6070 301573
>> > >>>> > java.net.SocketInputStream.socketRead0
>> > >>>> >    6  1.99% 91.43%    4848 301599
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>> > >>>> >    7  1.98% 93.42%    4824 301583
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>> > >>>> >    8  1.57% 94.99%    3824 301589
>> > >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>> > >>>> >    9  1.44% 96.42%    3504 301594
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
>> > m.refillDocs
>> > >>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>> > >>>> >   11  0.98% 98.50%    2388 301598
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   12  0.62% 99.12%    1522 301600
>> > >>>> > org.apache.lucene.store.DataInput.readVInt
>> > >>>> >   13  0.21% 99.33%     500 301612
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>> > >>>> >   14  0.07% 99.39%     167 301601
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>> > >>>> >   15  0.06% 99.45%     139 301619
>> java.lang.System.identityHashCode
>> > >>>> >   16  0.05% 99.50%     114 301632
>> > >>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>> > >>>> >   17  0.04% 99.54%      92 300708
>> java.util.zip.Inflater.inflateBytes
>> > >>>> >   18  0.03% 99.57%      76 301624
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
>> > orBlock
>> > >>>> >   19  0.03% 99.59%      68 300613
>> java.lang.ClassLoader.defineClass1
>> > >>>> >   20  0.03% 99.62%      68 301615
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   21  0.03% 99.65%      62 301635
>> > >>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>> > >>>> >   22  0.02% 99.66%      41 301664
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   23  0.01% 99.68%      31 301629
>> > >>>> > org.apache.lucene.util.FixedBitSet.<init>
>> > >>>> > CPU SAMPLES END
>> > >>>> >
>> > >>>> > Solr 5.2.1:
>> > >>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>> > >>>> > rank   self  accum   count trace method
>> > >>>> >    1 51.38% 51.38%  120954 301291
>> > >>>> > sun.nio.ch.EPollArrayWrapper.epollWait
>> > >>>> >    2 25.69% 77.07%   60477 301292
>> > >>>> > sun.nio.ch.ServerSocketChannelImpl.accept0
>> > >>>> >    3 10.59% 87.66%   24923 301369
>> > >>>> >
>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>> > >>>> >    4  2.20% 89.86%    5182 301414
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>> > >>>> >    5  2.01% 91.87%    4742 301384
>> > >>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>> > >>>> >    6  1.25% 93.12%    2944 301434
>> > >>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>> > >>>> >    7  1.11% 94.23%    2612 301367
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>> > >>>> >    8  0.94% 95.17%    2204 301390
>> org.apache.lucene.util.BitSet.or
>> > >>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>> > >>>> >   10  0.78% 96.87%    1825 301449
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.refillDocs
>> > >>>> >   11  0.73% 97.60%    1717 301378
>> > >>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>> > >>>> >   12  0.73% 98.33%    1715 301374
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   13  0.33% 98.66%     787 301387
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   14  0.16% 98.82%     374 301426
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   15  0.10% 98.93%     245 301382
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   16  0.09% 99.02%     219 301381
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>> > >>>> >   17  0.09% 99.11%     207 301370
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   18  0.06% 99.17%     153 301416
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   19  0.06% 99.24%     151 301427
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   20  0.06% 99.30%     151 301441
>> > >>>> > org.apache.lucene.store.DataInput.readVInt
>> > >>>> >   21  0.06% 99.36%     147 301389
>> java.lang.System.identityHashCode
>> > >>>> >   22  0.06% 99.42%     140 301375
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   23  0.04% 99.47%     104 301425
>> org.apache.lucene.util.BitSet.or
>> > >>>> >   24  0.03% 99.50%      76 301423
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnu
>> > m.nextDoc
>> > >>>> >   25  0.03% 99.53%      74 301454
>> > >>>> >
>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>> > >>>> >   26  0.03% 99.56%      65 301432
>> > >>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>> > >>>> >   27  0.02% 99.58%      53 301456
>> > >>>> > org.apache.lucene.util.FixedBitSet.or
>> > >>>> >   28  0.02% 99.60%      52 300077
>> java.lang.ClassLoader.defineClass1
>> > >>>> >   29  0.02% 99.63%      50 301464
>> > >>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>> > >>>> >   30  0.02% 99.64%      39 301438
>> > >>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>> > >>>> >   31  0.02% 99.66%      37 301465
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFlo
>> > orBlock
>> > >>>> >   32  0.02% 99.67%      36 301419
>> > >>>> >
>> > >>>> >
>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDoc
>> > >>>> > sEnum.nextDoc
>> > >>>> > CPU SAMPLES END
>> > >>>> >
>> > >>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <[email protected]
>> >
>> > >>>> > wrote:
>> > >>>> >>
>> > >>>> >> Hi Tomás,
>> > >>>> >>
>> > >>>> >> I suspect this might be related to LUCENE-5938. We changed the
>> > >>>> >> default rewrite method for multi-term queries to load documents
>> > >>>> >> into a sparse bit set first first, and only upgrade to a dense
>> > >>>> >> bit set when we know many documents match. When there are lots
>> > >>>> >> of terms to intersect, then we end up spending significant cpu
>> > >>>> >> time to build the sparse bit set to eventually upgrade to a
>> > >>>> >> dense bit set like before. This might be what you are seeing.
>> > >>>> >>
>> > >>>> >> You might see the issue less with the population field because
>> > >>>> >> it has fewer unique values, so postings lists are longer and the
>> > >>>> >> DocIdSet building logic can upgrade quicker to a dense bit set.
>> > >>>> >>
>> > >>>> >> Mike noticed this slowness when working on BDK trees and we
>> > >>>> >> changed this first phase to use a plain int[] array that we sort
>> > >>>> >> and deduplicate instead of a more fancy sparse bit set
>> > >>>> >> (LUCENE-6645), which seemed to make things faster. Would it be
>> > >>>> >> possible for you to also check a 5.3 snapshot?
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>> > >>>> >> <[email protected]> wrote:
>> > >>>> >> > Hi, I'm seeing some query performance degradation between
>> > >>>> >> > 4.10.4 and 5.2.1.
>> > >>>> >> > It doesn't happen with all the queries, but for queries like
>> > >>>> >> > range queries on fields with many different values the average
>> > >>>> >> > time in 5.2.1 is worse than in 4.10.4. Is anyone seeing
>> > >>>> >> > something similar?
>> > >>>> >> >
>> > >>>> >> > Test Details:
>> > >>>> >> > * Single thread running queries continuously. I run the test
>> > >>>> >> > twice for each Solr version.
>> > >>>> >> > * JMeter running on my laptop, Solr running on EC2, on an
>> > >>>> >> > m3.xlarge instance with all the defaults but with 5G heap.
>> > >>>> >> > Index in local disk (SSD)
>> > >>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not
>> > >>>> >> > in SolrCloud mode, no distributed search.
>> > >>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>> > >>>> >> > during the test. Index Size is around 1.1GB for Solr 5.2.1 and
>> > >>>> >> > 1.3GB for Solr
>> > >>>> >> > 4.10.4
>> > >>>> >> > (fits entirely in RAM)
>> > >>>> >> > * jdk1.8.0_45
>> > >>>> >> >
>> > >>>> >> > Queries: 3k boolean queries (generated with terms from the
>> > >>>> >> > dataset) with range queries as filters on "tlongitude" and
>> > >>>> >> > "tlatitude" fields with randomly generated bounds, e.g.
>> > >>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO
>> > X]&fq=tlatitude:[Y
>> > >>>> >> > TO Z]
>> > >>>> >> >
>> > >>>> >> > Fields are:
>> > >>>> >> > <field name="tlatitude" type="tdouble"/> <field
>> > >>>> >> > name="tlongitude" type="tdouble"/> Field Type:
>> > >>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>> > >>>> >> > precisionStep="8"
>> > >>>> >> > positionIncrementGap="0"/>
>> > >>>> >> >
>> > >>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than
>> > >>>> >> > 5.2.1 in average.
>> > >>>> >> >
>> > >>>> >> > http://snag.gy/2yPPM.jpg
>> > >>>> >> >
>> > >>>> >> > Doing only the boolean queries show no performance difference
>> > >>>> >> > between
>> > >>>> >> > 4.10
>> > >>>> >> > and 5.2, same thing if I do filters on a string field instead
>> > >>>> >> > of the range queries.
>> > >>>> >> >
>> > >>>> >> > When using "double" field type (precisionStep="0"), the
>> > >>>> >> > difference was
>> > >>>> >> > bigger:
>> > >>>> >> >
>> > >>>> >> > longitude/latitude fields:
>> > >>>> >> > <field name="longitude" type="double" docValues="true"/>
>> > >>>> >> > <field name="latitude" type="double" docValues="true"/>
>> > >>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>> > >>>> >> > precisionStep="0"
>> > >>>> >> > positionIncrementGap="0"/>
>> > >>>> >> >
>> > >>>> >> > http://snag.gy/Vi5uk.jpg
>> > >>>> >> > I understand this is not the best field type definition for
>> > >>>> >> > range queries, I'm just trying to understand the difference
>> > >>>> >> > between the two versions and why.
>> > >>>> >> >
>> > >>>> >> > Performance was OK when doing range queries on the
>> > "population"
>> > >>>> >> > field
>> > >>>> >> > (long), but that field doesn't have many different values,
>> > >>>> >> > only 300k out of the 8.3M docs have the population field with
>> > >>>> >> > a value different to 0. On the other hand, doing range queries
>> > >>>> >> > on the _version_ field did show a graph similar to the
>> > >>>> >> > previous one:
>> > >>>> >> >
>> > >>>> >> > <field name="_version_" type="long" indexed="true"
>> > >>>> >> > stored="true"/> <fieldType name="long"
>> > class="solr.TrieLongField" precisionStep="0"
>> > >>>> >> > positionIncrementGap="0"/>
>> > >>>> >> >
>> > >>>> >> > http://snag.gy/4tc7e.jpg
>> > >>>> >> >
>> > >>>> >> > Any idea what could be causing this? Is this expected after
>> > >>>> >> > some known change?
>> > >>>> >> >
>> > >>>> >> > With Solr 4.10, a single CPU core remains high during the test
>> > >>>> >> > (close to 100%), but with Solr 5.2, different cores go up and
>> > >>>> >> > down in utilization continuously. That's probably because of
>> > >>>> >> > the different Jetty version I suppose.
>> > >>>> >> > GC pattern looks similar in both. For both Solr versions I'm
>> > >>>> >> > using the settings that ship with Solr (in solr.in.sh) except
>> > >>>> >> > for Xmx and Xms
>> > >>>> >> >
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> --
>> > >>>> >> Adrien
>> > >>>> >>
>> > >>>> >> ----------------------------------------------------------------
>> > >>>> >> ----- To unsubscribe, e-mail: [email protected]
>> > >>>> >> For additional commands, e-mail: [email protected]
>> > >>>> >>
>> > >>>> >
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Adrien
>> > >>>>
>> > >>>> -------------------------------------------------------------------
>> > >>>> -- To unsubscribe, e-mail: [email protected] For
>> > >>>> additional commands, e-mail: [email protected]
>> > >>>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Anshum Gupta
>> > >
>> > >
>> > > --
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, 28213 Bremen
>> > > http://www.thetaphi.de
>> >
>> >
>> >
>> > --
>> > Adrien
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected] For
>> additional
>> > commands, e-mail: [email protected]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Reply via email to