Re: CachingWrapperQuery performance

Adrien Grand Sat, 27 Jun 2015 15:04:39 -0700

Hi Anton,

I don't see anything obvious. However something that surprised me is
that the number of matches looks quite different in the filter and
query case, are we actually comparing oranges to oranges?


I'm wondering if the explanation could lie in the fact that the
initial caching phase is more costly with CachingWrapperQuery than
CachingWrapperFilter because it does not reuse doc id sets directly
when they are cacheable and always copies its content. So maybe the
underlying issue is some over-caching of filters. If this is actually
the case, you might want to look into Lucene's built-in QueryCache
which has the ability to only starts caching filters if there is
evidence that they are being reused.


On Wed, Jun 17, 2015 at 1:54 PM, Anton Lyska <[email protected]> wrote:
> Hi Adrien.
>
> " Can you confirm that the response times that you are reporting both
> happen on Lucene 5.2 "
>  - no, I upgraded Lucene from 5.0 to 5.2, and only in 5.2 tried to use
> CachingWrapperQuery
>
> "  on a "hot cache" "
>   - answer is no, because I have a lot of different of queries, and
> therefore different filters.
>  I use LRU cache, so when filter is not in cache, I create new one.
> But nevertheless, my timings contain both heating CachingWrapperFilter and
> CachingWrapperQuery
>
>
>  Yesterday I reverted sources to use filters, and added some statistics
> that you had asked for. I will provide stat only for one query
>
>  - index size
>         18M docs
>
>  - response time with caching:
>
> reponseTimeWithCachingMs.samples    | 59487
> reponseTimeWithCachingMs.min        | 0
> reponseTimeWithCachingMs.avg        | 53.68003093112781
> reponseTimeWithCachingMs.max        | 2530
> reponseTimeWithCachingMs[-INF; 0)   | 0 (0.0000%; 0.0000%)
> reponseTimeWithCachingMs[0; 10)     | 216 (0.3631%; 0.3631%)
> reponseTimeWithCachingMs[10; 20)    | 42 (0.0706%; 0.4337%)
> reponseTimeWithCachingMs[20; 30)    | 12133 (20.3961%; 20.8298%)
> reponseTimeWithCachingMs[30; 40)    | 25749 (43.2851%; 64.1148%)
> reponseTimeWithCachingMs[40; 50)    | 9111 (15.3160%; 79.4308%)
> reponseTimeWithCachingMs[50; 60)    | 3572 (6.0047%; 85.4355%)
> reponseTimeWithCachingMs[60; 70)    | 1565 (2.6308%; 88.0663%)
> reponseTimeWithCachingMs[70; 80)    | 1062 (1.7853%; 89.8516%)
> reponseTimeWithCachingMs[80; 90)    | 535 (0.8994%; 90.7509%)
> reponseTimeWithCachingMs[90; 100)   | 390 (0.6556%; 91.4065%)
> reponseTimeWithCachingMs[100; 110)  | 446 (0.7497%; 92.1563%)
> reponseTimeWithCachingMs[110; 120)  | 476 (0.8002%; 92.9564%)
> reponseTimeWithCachingMs[120; 130)  | 380 (0.6388%; 93.5952%)
> reponseTimeWithCachingMs[130; 140)  | 189 (0.3177%; 93.9130%)
> reponseTimeWithCachingMs[140; 150)  | 157 (0.2639%; 94.1769%)
> reponseTimeWithCachingMs[150; 160)  | 106 (0.1782%; 94.3551%)
> reponseTimeWithCachingMs[160; 170)  | 94 (0.1580%; 94.5131%)
> reponseTimeWithCachingMs[170; 180)  | 82 (0.1378%; 94.6509%)
> reponseTimeWithCachingMs[180; 190)  | 126 (0.2118%; 94.8627%)
> reponseTimeWithCachingMs[190; 200)  | 129 (0.2169%; 95.0796%)
> reponseTimeWithCachingMs[200; 210)  | 140 (0.2353%; 95.3149%)
> reponseTimeWithCachingMs[210; 220)  | 143 (0.2404%; 95.5553%)
> reponseTimeWithCachingMs[220; 230)  | 177 (0.2975%; 95.8529%)
> reponseTimeWithCachingMs[230; 240)  | 224 (0.3766%; 96.2294%)
> reponseTimeWithCachingMs[240; 250)  | 222 (0.3732%; 96.6026%)
> reponseTimeWithCachingMs[250; 260)  | 220 (0.3698%; 96.9725%)
> reponseTimeWithCachingMs[260; 270)  | 212 (0.3564%; 97.3288%)
> reponseTimeWithCachingMs[270; 280)  | 175 (0.2942%; 97.6230%)
> reponseTimeWithCachingMs[280; 290)  | 213 (0.3581%; 97.9811%)
> reponseTimeWithCachingMs[290; 300)  | 268 (0.4505%; 98.4316%)
> reponseTimeWithCachingMs[300; 310)  | 225 (0.3782%; 98.8098%)
> reponseTimeWithCachingMs[310; 320)  | 155 (0.2606%; 99.0704%)
> reponseTimeWithCachingMs[320; 330)  | 116 (0.1950%; 99.2654%)
> reponseTimeWithCachingMs[330; 340)  | 72 (0.1210%; 99.3864%)
> reponseTimeWithCachingMs[340; 350)  | 49 (0.0824%; 99.4688%)
> reponseTimeWithCachingMs[350; 360)  | 46 (0.0773%; 99.5461%)
> reponseTimeWithCachingMs[360; 370)  | 43 (0.0723%; 99.6184%)
> ...
>
>  - response time without caching:
>       (I disabled caching on live app, and gathered some amount of samples)
>
> reponseTimeWithoutCachingMs.samples    | 1525
> reponseTimeWithoutCachingMs.min        | 8
> reponseTimeWithoutCachingMs.avg        | 115.27213114754099
> reponseTimeWithoutCachingMs.max        | 893
> reponseTimeWithoutCachingMs[-INF; 0)   | 0 (0.0000%; 0.0000%)
> reponseTimeWithoutCachingMs[0; 10)     | 2 (0.1311%; 0.1311%)
> reponseTimeWithoutCachingMs[10; 20)    | 0 (0.0000%; 0.1311%)
> reponseTimeWithoutCachingMs[20; 30)    | 0 (0.0000%; 0.1311%)
> reponseTimeWithoutCachingMs[30; 40)    | 29 (1.9016%; 2.0328%)
> reponseTimeWithoutCachingMs[40; 50)    | 193 (12.6557%; 14.6885%)
> reponseTimeWithoutCachingMs[50; 60)    | 136 (8.9180%; 23.6066%)
> reponseTimeWithoutCachingMs[60; 70)    | 110 (7.2131%; 30.8197%)
> reponseTimeWithoutCachingMs[70; 80)    | 107 (7.0164%; 37.8361%)
> reponseTimeWithoutCachingMs[80; 90)    | 134 (8.7869%; 46.6230%)
> reponseTimeWithoutCachingMs[90; 100)   | 111 (7.2787%; 53.9016%)
> reponseTimeWithoutCachingMs[100; 110)  | 36 (2.3607%; 56.2623%)
> reponseTimeWithoutCachingMs[110; 120)  | 48 (3.1475%; 59.4098%)
> reponseTimeWithoutCachingMs[120; 130)  | 37 (2.4262%; 61.8361%)
> reponseTimeWithoutCachingMs[130; 140)  | 39 (2.5574%; 64.3934%)
> reponseTimeWithoutCachingMs[140; 150)  | 56 (3.6721%; 68.0656%)
> reponseTimeWithoutCachingMs[150; 160)  | 50 (3.2787%; 71.3443%)
> reponseTimeWithoutCachingMs[160; 170)  | 58 (3.8033%; 75.1475%)
> reponseTimeWithoutCachingMs[170; 180)  | 71 (4.6557%; 79.8033%)
> reponseTimeWithoutCachingMs[180; 190)  | 69 (4.5246%; 84.3279%)
> reponseTimeWithoutCachingMs[190; 200)  | 73 (4.7869%; 89.1148%)
> reponseTimeWithoutCachingMs[200; 210)  | 50 (3.2787%; 92.3934%)
> reponseTimeWithoutCachingMs[210; 220)  | 44 (2.8852%; 95.2787%)
> reponseTimeWithoutCachingMs[220; 230)  | 15 (0.9836%; 96.2623%)
> reponseTimeWithoutCachingMs[230; 240)  | 11 (0.7213%; 96.9836%)
> reponseTimeWithoutCachingMs[240; 250)  | 12 (0.7869%; 97.7705%)
> reponseTimeWithoutCachingMs[250; 260)  | 13 (0.8525%; 98.6229%)
> reponseTimeWithoutCachingMs[260; 270)  | 10 (0.6557%; 99.2787%)
> reponseTimeWithoutCachingMs[270; 280)  | 6 (0.3934%; 99.6721%)
> reponseTimeWithoutCachingMs[280; 290)  | 1 (0.0656%; 99.7377%)
> reponseTimeWithoutCachingMs[290; 300)  | 2 (0.1311%; 99.8689%)
> reponseTimeWithoutCachingMs[300; 310)  | 1 (0.0656%; 99.9344%)
>
>
> - how many documents cached filter matches:
>
> filterMatchCount.samples        | 24071
> filterMatchCount.min            | 0
> filterMatchCount.avg            | 17765.553736861784
> filterMatchCount.max            | 438929
> filterMatchCount[-INF; 0)       | 0 (0.0000%; 0.0000%)
> filterMatchCount[0; 1000)       | 10233 (42.5117%; 42.5117%)
> filterMatchCount[1000; 2000)    | 2338 (9.7129%; 52.2247%)
> filterMatchCount[2000; 3000)    | 1330 (5.5253%; 57.7500%)
> filterMatchCount[3000; 4000)    | 950 (3.9467%; 61.6966%)
> filterMatchCount[4000; 5000)    | 642 (2.6671%; 64.3638%)
> filterMatchCount[5000; 6000)    | 582 (2.4178%; 66.7816%)
> filterMatchCount[6000; 7000)    | 481 (1.9983%; 68.7799%)
> filterMatchCount[7000; 8000)    | 494 (2.0523%; 70.8321%)
> filterMatchCount[8000; 9000)    | 351 (1.4582%; 72.2903%)
> filterMatchCount[9000; 10000)   | 298 (1.2380%; 73.5283%)
> filterMatchCount[10000; 11000)  | 255 (1.0594%; 74.5877%)
> filterMatchCount[11000; 12000)  | 194 (0.8059%; 75.3936%)
> filterMatchCount[12000; 13000)  | 209 (0.8683%; 76.2619%)
> filterMatchCount[13000; 14000)  | 191 (0.7935%; 77.0554%)
> filterMatchCount[14000; 15000)  | 183 (0.7603%; 77.8156%)
> filterMatchCount[15000; 16000)  | 216 (0.8973%; 78.7130%)
> filterMatchCount[16000; 17000)  | 146 (0.6065%; 79.3195%)
> filterMatchCount[17000; 18000)  | 190 (0.7893%; 80.1088%)
> filterMatchCount[18000; 19000)  | 134 (0.5567%; 80.6655%)
> filterMatchCount[19000; 20000)  | 174 (0.7229%; 81.3884%)
> filterMatchCount[20000; +INF)   | 4480 (18.6116%; 100.0000%)
>
> - how many documents query matches:
>
> queryMatchCount.samples        | 24198
> queryMatchCount.min            | 0
> queryMatchCount.avg            | 2489.612240681048
> queryMatchCount.max            | 75347
> queryMatchCount-INF; 0)        | 0 (0.0000%; 0.0000%)
> queryMatchCount[0; 1000)       | 16339 (67.5221%; 67.5221%)
> queryMatchCount[1000; 2000)    | 2210 (9.1330%; 76.6551%)
> queryMatchCount[2000; 3000)    | 1242 (5.1327%; 81.7878%)
> queryMatchCount[3000; 4000)    | 792 (3.2730%; 85.0607%)
> queryMatchCount[4000; 5000)    | 482 (1.9919%; 87.0527%)
> queryMatchCount[5000; 6000)    | 545 (2.2523%; 89.3049%)
> queryMatchCount[6000; 7000)    | 444 (1.8349%; 91.1398%)
> queryMatchCount[7000; 8000)    | 248 (1.0249%; 92.1646%)
> queryMatchCount[8000; 9000)    | 166 (0.6860%; 92.8506%)
> queryMatchCount[9000; 10000)   | 172 (0.7108%; 93.5615%)
> queryMatchCount[10000; 11000)  | 91 (0.3761%; 93.9375%)
> queryMatchCount[11000; 12000)  | 85 (0.3513%; 94.2888%)
> queryMatchCount[12000; 13000)  | 59 (0.2438%; 94.5326%)
> queryMatchCount[13000; 14000)  | 58 (0.2397%; 94.7723%)
> queryMatchCount[14000; 15000)  | 37 (0.1529%; 94.9252%)
> queryMatchCount[15000; 16000)  | 164 (0.6777%; 95.6029%)
> queryMatchCount[16000; 17000)  | 36 (0.1488%; 95.7517%)
> queryMatchCount[17000; 18000)  | 25 (0.1033%; 95.8550%)
> queryMatchCount[18000; 19000)  | 109 (0.4505%; 96.3055%)
> queryMatchCount[19000; 20000)  | 25 (0.1033%; 96.4088%)
> queryMatchCount[20000; +INF)   | 869 (3.5912%; 100.0000%)
>
>
> I think I should say few more details.
>
> First, there is difference between samples count  in queryMatchCount/
> filterMatchCount and reponseTimeWithCachingMs.
> This is because I need to execute one more query to get filterMatchCount (
>  searcher.search(new MatchAllDocsQuery(), searchFilter,
> totalHitCountCollector); )
> and I do this only for the first search, and dont do when execute
> searchAfter func. But reponseTimeWithCachingMs also contains time of
> searchAfter .
>
> Second, I will explain about stats.
>    queries "doc.fieldA=1 and doc.fieldB=2" and  "doc.fieldA=3 and
> doc.fieldB=4" I process as queries of same type and aggregate them in one
> stat.
>  Therefore  there is a big variance in filterMatchCount & queryMatchCount.
>
>
> And finally, if you notice, query with filter takes 53 ms, but in previous
> letter I provided much less number (22 ms for this query).
> This may happen because I was too lazy while reverting sourses. Instead of
> revert using BooleanFilter, TermFilter, NumericRangeFilter (as was before)
> I just create chain  CachingWrapperFilter(QueryWrapperFilter( BooleanQuery
> (TermQueries... NumericRangeQueries) )).
> Second possible reason why there is difference:  22ms was on Lucene 5.0.
>
> But nevertheless, on Lucene 5.2 query with CachingWrapperQuery takes 95ms,
> with CachingWrapperFilter 53 ms, and without any caching 115ms.
>
> Huh, I think that's all. I would appreciate any advice.
>
>
> Regards,
> Anton
>
>
> 2015-06-16 0:35 GMT+03:00 Adrien Grand <[email protected]>:
>
>> Hi Anton,
>>
>> Thanks for reporting this. It is indeed a bit surprising given that
>> both classes work in a very similar way. Can you confirm that the
>> response times that you are reporting both happen on Lucene 5.2 (even
>> with CachingWrapperFilter) and on a "hot cache" (so that they don't
>> include the generation of the cache entries)?
>>
>> Could you also provide us with the following information:
>>  - what are the response times without caching?
>>  - how many docs do you have in the index?
>>  - how many documents does the cached filter match?
>>  - how many documents do your two popular queries match?
>>
>> If you have the opportunity to perform some profiling of your query
>> execution, this might be useful as well. Thanks!
>>
>>
>> On Mon, Jun 15, 2015 at 10:56 AM, Anton Lyska <[email protected]>
>> wrote:
>> >  Hi,
>> >
>> >     I have performance issues with CachingWrapperQuery with lucene 5.2
>> and
>> > dont know how to solve it.
>> >
>> > Prehistory:
>> >     I have search with different parameters, where some parameters are
>> used
>> > more frequently then others. For these params I used filters(and cached
>> > them), and my search looked like this
>> >
>> >     BooleanFilter bf = new BooleanFilter ();
>> >     bf.add(.., BooleanClause.Occur.MUST);    //frequently used params
>> here
>> >     bf.add(.., BooleanClause.Occur.MUST);
>> >
>> >     Filter searchFilter = new CachingWrapperFilter(bf);    //and this
>> > filter object was reused between queries
>> >
>> >     BooleanQuery searchQuery = new BooleanQuery();
>> >     searchQuery.add(.., BooleanClause.Occur.MUST);
>> >     searchQuery.add(.., BooleanClause.Occur.MUST);
>> >
>> >     hits = searcher.search(searchQuery, searchFilter, count, sort);
>> >
>> > And everything were great. I looked at two popular queries and each took
>> 32
>> > ms and 22 ms to execute.
>> >
>> > Now I did upgrade Lucene to 5.2 (from 5.0), and saw that filters are
>> > deprecated and its advisable to use queries and CachingWrapperQuery.
>> >
>> > So i changed sources to this:
>> >
>> >
>> >     BooleanQuery bq = new BooleanQuery ();   //now it becomes
>> BooleanQuery
>> >     bq.add(.., BooleanClause.Occur.MUST);    //frequently used params
>> here,
>> > same terms as before
>> >     bq.add(.., BooleanClause.Occur.MUST);
>> >
>> >     Query cachedQuery = new CachingWrapperQuery(bq);    //this object is
>> > reused between queries
>> >
>> >     BooleanQuery searchQuery = new BooleanQuery();
>> >     searchQuery.add(.., BooleanClause.Occur.MUST);
>> >     searchQuery.add(.., BooleanClause.Occur.MUST);
>> >     searchQuery.add(cachedQuery, BooleanClause.Occur.FILTER);    //here i
>> > add cached part of the query
>> >
>> >     hits = searcher.search(searchQuery, count, sort);
>> >
>> > Later I looked at same queries, and they took 145ms and 95ms  to execute!
>> >
>> > Moving to CachingWrapperQuery was the only difference between queries, so
>> > question is:
>> >    How to use CachingWrapperQuery right in my situation, or should I
>> switch
>> > back to filters?
>>
>>
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: CachingWrapperQuery performance

Reply via email to