Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Adrien Grand Tue, 04 Aug 2015 01:36:03 -0700

Tomás said Solr was running on EC2 in his initial email, so this can't
be MacOSX.


On Tue, Aug 4, 2015 at 10:21 AM, Uwe Schindler <[email protected]> wrote:
> Hi,
>
> Just related: Do you run tests on MacOSX? NanoTime is damn slow there! And
> its also not monotonic! There wss a discussion on hotspot-dev about that one
> year ago, but no solution until now.
>
> Macos had no real nanotime impl in kernel so java falls back to wall clock,
> which is slow and non monotonic with ntp or vmware... (see test failures in
> the past on policeman when ntp adjusted clock).
>
> If you run on Linux this could also be a good test: on osx the difference to
> prev version should be even larger.
>
> Uwe
>
>
> Am 4. August 2015 03:07:45 MESZ, schrieb Anshum Gupta
> <[email protected]>:
>>
>> Guess bypassing that check and always returning true instead would help
>> confirm that. If that's the case, we should just initialize the timeout to
>> Long.MAX_VALUE and check for that to short-circuit?
>>
>> On Mon, Aug 3, 2015 at 3:50 PM, Tomás Fernández Löbbe
>> <[email protected]> wrote:
>>>
>>> Yes, I saw that, but thought it could be the underlying implementation,
>>> not the "ExitableTermsEnum" wrapper. Maybe it's related to the calls to
>>> System.nanoTime then...
>>>
>>> On Mon, Aug 3, 2015 at 3:11 PM, Adrien Grand <[email protected]> wrote:
>>>>
>>>> Thanks for sharing the traces, it looks like my intuition was wrong.
>>>> :) They seem to point to
>>>> ExitableDirectoryReader$ExitableTermsEnum.next, which checks whether
>>>> the time is out before delegating.
>>>>
>>>> On Mon, Aug 3, 2015 at 7:21 PM, Tomás Fernández Löbbe
>>>> <[email protected]> wrote:
>>>> > Thanks Adrien,
>>>> > I'll run the tests with 5.3 snapshot and post the results here. In
>>>> > case this
>>>> > helps, this is the hprof samples output
>>>> > (-Xrunhprof:cpu=samples,depth=3,file=/home/ec2-user/hprof_output.txt)
>>>> > for
>>>> > 4.10.4 and 5.2.1 in my test:
>>>> >
>>>> > Solr 4.10.4:
>>>> > CPU SAMPLES BEGIN (total = 243525) Fri Jul 31 22:29:06 2015
>>>> > rank   self  accum   count trace method
>>>> >    1 75.07% 75.07%  182812 300523
>>>> > java.net.PlainSocketImpl.socketAccept
>>>> >    2  4.27% 79.34%   10408 301576
>>>> >
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMetaData
>>>> >    3  4.15% 83.49%   10108 301585
>>>> > org.apache.lucene.index.FilteredTermsEnum.docs
>>>> >    4  3.46% 86.95%    8419 301582
>>>> > org.apache.lucene.index.FilteredTermsEnum.next
>>>> >    5  2.49% 89.44%    6070 301573
>>>> > java.net.SocketInputStream.socketRead0
>>>> >    6  1.99% 91.43%    4848 301599
>>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>>> >    7  1.98% 93.42%    4824 301583
>>>> > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet
>>>> >    8  1.57% 94.99%    3824 301589
>>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>>> >    9  1.44% 96.42%    3504 301594
>>>> >
>>>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs
>>>> >   10  1.09% 97.51%    2655 301581 java.nio.Bits.copyToArray
>>>> >   11  0.98% 98.50%    2388 301598
>>>> >
>>>> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc
>>>> >   12  0.62% 99.12%    1522 301600
>>>> > org.apache.lucene.store.DataInput.readVInt
>>>> >   13  0.21% 99.33%     500 301612
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs
>>>> >   14  0.07% 99.39%     167 301601
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>>> >   15  0.06% 99.45%     139 301619 java.lang.System.identityHashCode
>>>> >   16  0.05% 99.50%     114 301632
>>>> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock
>>>> >   17  0.04% 99.54%      92 300708 java.util.zip.Inflater.inflateBytes
>>>> >   18  0.03% 99.57%      76 301624
>>>> >
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>>> >   19  0.03% 99.59%      68 300613 java.lang.ClassLoader.defineClass1
>>>> >   20  0.03% 99.62%      68 301615
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   21  0.03% 99.65%      62 301635
>>>> > org.apache.solr.search.SolrIndexSearcher.getDocSetNC
>>>> >   22  0.02% 99.66%      41 301664
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   23  0.01% 99.68%      31 301629
>>>> > org.apache.lucene.util.FixedBitSet.<init>
>>>> > CPU SAMPLES END
>>>> >
>>>> > Solr 5.2.1:
>>>> > CPU SAMPLES BEGIN (total = 235415) Fri Jul 31 22:42:06 2015
>>>> > rank   self  accum   count trace method
>>>> >    1 51.38% 51.38%  120954 301291
>>>> > sun.nio.ch.EPollArrayWrapper.epollWait
>>>> >    2 25.69% 77.07%   60477 301292
>>>> > sun.nio.ch.ServerSocketChannelImpl.accept0
>>>> >    3 10.59% 87.66%   24923 301369
>>>> > org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.next
>>>> >    4  2.20% 89.86%    5182 301414
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings
>>>> >    5  2.01% 91.87%    4742 301384
>>>> > org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.postings
>>>> >    6  1.25% 93.12%    2944 301434
>>>> > java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss
>>>> >    7  1.11% 94.23%    2612 301367
>>>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>>> >    8  0.94% 95.17%    2204 301390 org.apache.lucene.util.BitSet.or
>>>> >    9  0.93% 96.10%    2190 301383 java.nio.Bits.copyToArray
>>>> >   10  0.78% 96.87%    1825 301449
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs
>>>> >   11  0.73% 97.60%    1717 301378
>>>> > org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll
>>>> >   12  0.73% 98.33%    1715 301374 org.apache.lucene.util.BitSet.or
>>>> >   13  0.33% 98.66%     787 301387
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> >   14  0.16% 98.82%     374 301426
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> >   15  0.10% 98.93%     245 301382 org.apache.lucene.util.BitSet.or
>>>> >   16  0.09% 99.02%     219 301381
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next
>>>> >   17  0.09% 99.11%     207 301370 org.apache.lucene.util.BitSet.or
>>>> >   18  0.06% 99.17%     153 301416 org.apache.lucene.util.BitSet.or
>>>> >   19  0.06% 99.24%     151 301427 org.apache.lucene.util.BitSet.or
>>>> >   20  0.06% 99.30%     151 301441
>>>> > org.apache.lucene.store.DataInput.readVInt
>>>> >   21  0.06% 99.36%     147 301389 java.lang.System.identityHashCode
>>>> >   22  0.06% 99.42%     140 301375
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   23  0.04% 99.47%     104 301425 org.apache.lucene.util.BitSet.or
>>>> >   24  0.03% 99.50%      76 301423
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> >   25  0.03% 99.53%      74 301454
>>>> > org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite
>>>> >   26  0.03% 99.56%      65 301432
>>>> > org.apache.lucene.util.BitDocIdSet$Builder.or
>>>> >   27  0.02% 99.58%      53 301456
>>>> > org.apache.lucene.util.FixedBitSet.or
>>>> >   28  0.02% 99.60%      52 300077 java.lang.ClassLoader.defineClass1
>>>> >   29  0.02% 99.63%      50 301464
>>>> > org.apache.lucene.codecs.lucene50.ForUtil.readBlock
>>>> >   30  0.02% 99.64%      39 301438
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.next
>>>> >   31  0.02% 99.66%      37 301465
>>>> >
>>>> > org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadNextFloorBlock
>>>> >   32  0.02% 99.67%      36 301419
>>>> >
>>>> > org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc
>>>> > CPU SAMPLES END
>>>> >
>>>> > On Fri, Jul 31, 2015 at 4:23 PM, Adrien Grand <[email protected]>
>>>> > wrote:
>>>> >>
>>>> >> Hi Tomás,
>>>> >>
>>>> >> I suspect this might be related to LUCENE-5938. We changed the
>>>> >> default
>>>> >> rewrite method for multi-term queries to load documents into a sparse
>>>> >> bit set first first, and only upgrade to a dense bit set when we know
>>>> >> many documents match. When there are lots of terms to intersect, then
>>>> >> we end up spending significant cpu time to build the sparse bit set
>>>> >> to
>>>> >> eventually upgrade to a dense bit set like before. This might be what
>>>> >> you are seeing.
>>>> >>
>>>> >> You might see the issue less with the population field because it has
>>>> >> fewer unique values, so postings lists are longer and the DocIdSet
>>>> >> building logic can upgrade quicker to a dense bit set.
>>>> >>
>>>> >> Mike noticed this slowness when working on BDK trees and we changed
>>>> >> this first phase to use a plain int[] array that we sort and
>>>> >> deduplicate instead of a more fancy sparse bit set (LUCENE-6645),
>>>> >> which seemed to make things faster. Would it be possible for you to
>>>> >> also check a 5.3 snapshot?
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, Jul 31, 2015 at 10:51 PM, Tomás Fernández Löbbe
>>>> >> <[email protected]> wrote:
>>>> >> > Hi, I'm seeing some query performance degradation between 4.10.4
>>>> >> > and
>>>> >> > 5.2.1.
>>>> >> > It doesn't happen with all the queries, but for queries like range
>>>> >> > queries
>>>> >> > on fields with many different values the average time in 5.2.1 is
>>>> >> > worse
>>>> >> > than
>>>> >> > in 4.10.4. Is anyone seeing something similar?
>>>> >> >
>>>> >> > Test Details:
>>>> >> > * Single thread running queries continuously. I run the test twice
>>>> >> > for
>>>> >> > each
>>>> >> > Solr version.
>>>> >> > * JMeter running on my laptop, Solr running on EC2, on an m3.xlarge
>>>> >> > instance
>>>> >> > with all the defaults but with 5G heap. Index in local disk (SSD)
>>>> >> > * Plain Solr releases, nothing custom. Single Solr core, not in
>>>> >> > SolrCloud
>>>> >> > mode, no distributed search.
>>>> >> > * "allCountries" geonames dataset (~8M small docs). No updates
>>>> >> > during
>>>> >> > the
>>>> >> > test. Index Size is around 1.1GB for Solr 5.2.1 and 1.3GB for Solr
>>>> >> > 4.10.4
>>>> >> > (fits entirely in RAM)
>>>> >> > * jdk1.8.0_45
>>>> >> >
>>>> >> > Queries: 3k boolean queries (generated with terms from the dataset)
>>>> >> > with
>>>> >> > range queries as filters on "tlongitude" and "tlatitude" fields
>>>> >> > with
>>>> >> > randomly generated bounds, e.g.
>>>> >> > q=name:foo OR name:bar&fq=tlongitude:[W TO X]&fq=tlatitude:[Y TO Z]
>>>> >> >
>>>> >> > Fields are:
>>>> >> > <field name="tlatitude" type="tdouble"/>
>>>> >> > <field name="tlongitude" type="tdouble"/>
>>>> >> > Field Type:
>>>> >> > <fieldType name="tdouble" class="solr.TrieDoubleField"
>>>> >> > precisionStep="8"
>>>> >> > positionIncrementGap="0"/>
>>>> >> >
>>>> >> > In this case, Solr 4.10.4 was between 20% to 30% faster than 5.2.1
>>>> >> > in
>>>> >> > average.
>>>> >> >
>>>> >> > http://snag.gy/2yPPM.jpg
>>>> >> >
>>>> >> > Doing only the boolean queries show no performance difference
>>>> >> > between
>>>> >> > 4.10
>>>> >> > and 5.2, same thing if I do filters on a string field instead of
>>>> >> > the
>>>> >> > range
>>>> >> > queries.
>>>> >> >
>>>> >> > When using "double" field type (precisionStep="0"), the difference
>>>> >> > was
>>>> >> > bigger:
>>>> >> >
>>>> >> > longitude/latitude fields:
>>>> >> > <field name="longitude" type="double" docValues="true"/>
>>>> >> > <field name="latitude" type="double" docValues="true"/>
>>>> >> > <fieldType name="double" class="solr.TrieDoubleField"
>>>> >> > precisionStep="0"
>>>> >> > positionIncrementGap="0"/>
>>>> >> >
>>>> >> > http://snag.gy/Vi5uk.jpg
>>>> >> > I understand this is not the best field type definition for range
>>>> >> > queries,
>>>> >> > I'm just trying to understand the difference between the two
>>>> >> > versions
>>>> >> > and
>>>> >> > why.
>>>> >> >
>>>> >> > Performance was OK when doing range queries on the "population"
>>>> >> > field
>>>> >> > (long), but that field doesn't have many different values, only
>>>> >> > 300k out
>>>> >> > of
>>>> >> > the 8.3M docs have the population field with a value different to
>>>> >> > 0. On
>>>> >> > the
>>>> >> > other hand, doing range queries on the _version_ field did show a
>>>> >> > graph
>>>> >> > similar to the previous one:
>>>> >> >
>>>> >> > <field name="_version_" type="long" indexed="true" stored="true"/>
>>>> >> > <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>>>> >> > positionIncrementGap="0"/>
>>>> >> >
>>>> >> > http://snag.gy/4tc7e.jpg
>>>> >> >
>>>> >> > Any idea what could be causing this? Is this expected after some
>>>> >> > known
>>>> >> > change?
>>>> >> >
>>>> >> > With Solr 4.10, a single CPU core remains high during the test
>>>> >> > (close to
>>>> >> > 100%), but with Solr 5.2, different cores go up and down in
>>>> >> > utilization
>>>> >> > continuously. That's probably because of the different Jetty
>>>> >> > version I
>>>> >> > suppose.
>>>> >> > GC pattern looks similar in both. For both Solr versions I'm using
>>>> >> > the
>>>> >> > settings that ship with Solr (in solr.in.sh) except for Xmx and Xms
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Adrien
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: [email protected]
>>>> >> For additional commands, e-mail: [email protected]
>>>> >>
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>
>>
>>
>> --
>> Anshum Gupta
>
>
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Understanding performance degradation in range queries between Solr 5.2.1 and 4.10.4

Reply via email to