[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

Michael McCandless (JIRA) Tue, 30 Mar 2010 05:36:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851372#action_12851372
 ]


Michael McCandless commented on LUCENE-2111:
--------------------------------------------

Towards wrapping up flex, I ran a set of tests to benchmark flex's
search performance vs trunk.

All tests are on a 5M doc Wikipedia index, best qps of 5 runs where
each run runs the query for 5.0 seconds.  Env is:
{noformat}
JAVA:
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

OS:
Linux centos 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64 
x86_64 x86_64 GNU/Linux
{noformat}

First table compares trunk against "flex on flex", ie, a flex index
(fully reindexed after upgrading to flex):

||Query||Tot hits||Sort||QPS trunk||QPS new||Pct change||
|1|591225| |68.36|80.64|{color:green}18.0%{color}|
| | |title|64.12|68.53|{color:green}6.9%{color}|
|1 OR 2|953081| |19.35|20.80|{color:green}7.5%{color}|
| | |title|16.50|17.48|{color:green}5.9%{color}|
|1 OR 2 OR 3|1131679| |14.37|15.50|{color:green}7.9%{color}|
| | |title|12.42|13.26|{color:green}6.8%{color}|
|1 OR 2 OR 3 OR 4|1266805| |10.94|12.76|{color:green}16.6%{color}|
| | |title|10.36|11.05|{color:green}6.7%{color}|
|1 AND 2|239303| |21.19|22.32|{color:green}5.3%{color}|
| | |title|22.77|24.25|{color:green}6.5%{color}|
|1 AND 2 AND 3|109513| |18.83|19.17|{color:green}1.8%{color}|
| | |title|19.30|20.06|{color:green}3.9%{color}|
|1 AND 2 AND 3 AND 4|60795| |16.21|17.51|{color:green}8.0%{color}|
| | |title|16.75|18.29|{color:green}9.2%{color}|
|"united states"|528845| |7.54|8.54|{color:green}13.3%{color}|
| | |title|7.36|8.14|{color:green}10.6%{color}|
|"united states of america"|12144| |20.64|21.48|{color:green}4.1%{color}|
| | |title|20.45|21.06|{color:green}3.0%{color}|
|un*|2250238| |9.31|11.54|{color:green}24.0%{color}|
| | |title|8.42|10.96|{color:green}30.2%{color}|
|*ent|2482701| |0.32|0.92|{color:green}187.5%{color}|
| | |title|0.32|0.91|{color:green}184.4%{color}|
|u*t|169192| |18.53|47.97|{color:green}158.9%{color}|
| | |title|17.26|40.10|{color:green}132.3%{color}|
|uni*|1308332| |18.54|23.49|{color:green}26.7%{color}|
| | |title|16.28|20.02|{color:green}23.0%{color}|
|un*t|124623| |62.13|105.23|{color:green}69.4%{color}|
| | |title|50.38|74.99|{color:green}48.8%{color}|
|?t|554722| |0.51|29.31|{color:green}5647.1%{color}|
| | |title|0.51|26.25|{color:green}5047.1%{color}|
|??t|1605437| |0.60|6.69|{color:green}1015.0%{color}|
| | |title|0.60|6.22|{color:green}936.7%{color}|
|???t|3100067| |0.54|1.92|{color:green}255.6%{color}|
| | |title|0.53|1.89|{color:green}256.6%{color}|
|????t|2973045| |0.51|0.71|{color:green}39.2%{color}|
| | |title|0.51|0.70|{color:green}37.3%{color}|
|?????t|2323871| |0.51|0.39|{color:red}-23.5%{color}|
| | |title|0.50|0.39|{color:red}-22.0%{color}|
|??????t|2459025| |0.49|0.31|{color:red}-36.7%{color}|
| | |title|0.48|0.15|{color:red}-68.7%{color}|
|un?t|86664| |92.45|241.46|{color:green}161.2%{color}|
| | |title|72.59|151.28|{color:green}108.4%{color}|
|un??t|2860| |222.11|408.52|{color:green}83.9%{color}|
| | |title|220.91|405.84|{color:green}83.7%{color}|
|un???t|5828| |117.38|99.64|{color:red}-15.1%{color}|
| | |title|111.47|98.64|{color:red}-11.5%{color}|
|un????t|1426| |207.03|100.60|{color:red}-51.4%{color}|
| | |title|207.23|101.36|{color:red}-51.1%{color}|
|united~0.5|872873| |0.35|0.31|{color:red}-11.4%{color}|
| | |title|0.35|0.31|{color:red}-11.4%{color}|
|united~0.6|764041| |0.46|5.22|{color:green}1034.8%{color}|
| | |title|0.45|5.00|{color:green}1011.1%{color}|
|united~0.7|695756| |0.59|21.19|{color:green}3491.5%{color}|
| | |title|0.60|19.10|{color:green}3083.3%{color}|
|united~0.8|693134| |0.59|21.44|{color:green}3533.9%{color}|
| | |title|0.58|19.55|{color:green}3270.7%{color}|
|united~0.9|692299| |57.06|67.80|{color:green}18.8%{color}|
| | |title|55.28|57.87|{color:green}4.7%{color}|

I also ran the same queries through, but this time using the trunk
(pre-flex) index with flex, ie to perf test the "flex on pre-flex"
emulation layer.  This is the initial experience users will see if
they upgrade to flex but don't reindex:

||Query||Tot hits||Sort||QPS trunk||QPS new||Pct change||
|1|591225| |68.36|66.91|{color:red}-2.1%{color}|
| | |title|64.12|58.47|{color:red}-8.8%{color}|
|1 OR 2|953081| |19.35|19.06|{color:red}-1.5%{color}|
| | |title|16.50|16.03|{color:red}-2.8%{color}|
|1 OR 2 OR 3|1131679| |14.37|14.14|{color:red}-1.6%{color}|
| | |title|12.42|12.11|{color:red}-2.5%{color}|
|1 OR 2 OR 3 OR 4|1266805| |10.94|11.61|{color:green}6.1%{color}|
| | |title|10.36|10.04|{color:red}-3.1%{color}|
|1 AND 2|239303| |21.19|21.12|{color:red}-0.3%{color}|
| | |title|22.77|22.46|{color:red}-1.4%{color}|
|1 AND 2 AND 3|109513| |18.83|18.81|{color:red}-0.1%{color}|
| | |title|19.30|19.29|{color:red}-0.1%{color}|
|1 AND 2 AND 3 AND 4|60795| |16.21|17.18|{color:green}6.0%{color}|
| | |title|16.75|17.46|{color:green}4.2%{color}|
|"united states"|528845| |7.54|7.63|{color:green}1.2%{color}|
| | |title|7.36|7.12|{color:red}-3.3%{color}|
|"united states of america"|12144| |20.64|19.33|{color:red}-6.3%{color}|
| | |title|20.45|19.50|{color:red}-4.6%{color}|
|un*|2250238| |9.31|9.79|{color:green}5.2%{color}|
| | |title|8.42|9.65|{color:green}14.6%{color}|
|*ent|2482701| |0.32|0.45|{color:green}40.6%{color}|
| | |title|0.32|0.45|{color:green}40.6%{color}|
|u*t|169192| |18.53|24.75|{color:green}33.6%{color}|
| | |title|17.26|21.96|{color:green}27.2%{color}|
|uni*|1308332| |18.54|19.39|{color:green}4.6%{color}|
| | |title|16.28|15.86|{color:red}-2.6%{color}|
|un*t|124623| |62.13|59.73|{color:red}-3.9%{color}|
| | |title|50.38|48.51|{color:red}-3.7%{color}|
|?t|554722| |0.51|23.65|{color:green}4537.3%{color}|
| | |title|0.51|21.42|{color:green}4100.0%{color}|
|??t|1605437| |0.60|5.13|{color:green}755.0%{color}|
| | |title|0.60|4.61|{color:green}668.3%{color}|
|???t|3100067| |0.54|1.28|{color:green}137.0%{color}|
| | |title|0.53|1.24|{color:green}134.0%{color}|
|????t|2973045| |0.51|0.55|{color:green}7.8%{color}|
| | |title|0.51|0.54|{color:green}5.9%{color}|
|?????t|2323871| |0.51|0.29|{color:red}-43.1%{color}|
| | |title|0.50|0.29|{color:red}-42.0%{color}|
|??????t|2459025| |0.49|0.18|{color:red}-63.3%{color}|
| | |title|0.48|0.21|{color:red}-56.2%{color}|
|un?t|86664| |92.45|202.48|{color:green}119.0%{color}|
| | |title|72.59|134.55|{color:green}85.4%{color}|
|un??t|2860| |222.11|187.05|{color:red}-15.8%{color}|
| | |title|220.91|186.81|{color:red}-15.4%{color}|
|un???t|5828| |117.38|69.30|{color:red}-41.0%{color}|
| | |title|111.47|68.59|{color:red}-38.5%{color}|
|un????t|1426| |207.03|60.98|{color:red}-70.5%{color}|
| | |title|207.23|60.62|{color:red}-70.7%{color}|
|united~0.5|872873| |0.35|0.23|{color:red}-34.3%{color}|
| | |title|0.35|0.23|{color:red}-34.3%{color}|
|united~0.6|764041| |0.46|3.84|{color:green}734.8%{color}|
| | |title|0.45|3.76|{color:green}735.6%{color}|
|united~0.7|695756| |0.59|17.45|{color:green}2857.6%{color}|
| | |title|0.60|15.53|{color:green}2488.3%{color}|
|united~0.8|693134| |0.59|17.56|{color:green}2876.3%{color}|
| | |title|0.58|15.97|{color:green}2653.4%{color}|
|united~0.9|692299| |57.06|56.02|{color:red}-1.8%{color}|
| | |title|55.28|49.26|{color:red}-10.9%{color}|

There are alot of numbers to absorb... but here's my take:

  * Flex is generally faster.

  * Fuzzy queries and certain wildcard queries (using AutomatonQuery)
    are insanely faster.

  * There are certain specific wildcard corner cases where we are
    slower, but these are likely rarely used in practice (many ?'s
    followed by a suffix).

  * Flex API on a trunk index does take a perf hit but it looks contained enough
    that we don't need to spend any time optimizing that emulation layer...

I also ran an indexing test (index first 10M docs of wikipedia) and
flex and trunk had similar times.

I think net/net we are good to land flex!


> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: benchUtil.py, flex_backwards_merge_912395.patch, 
> flex_merge_916543.patch, flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, 
> LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111_bytesRef.patch, 
> LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, 
> LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch, 
> LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search 
> performance testing looks good, it survived several visits from the Unicode 
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny 
> especially on the "emulate old API on flex index" and vice/versa code paths, 
> and still needs some more performance testing.  I'll do these under this 
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

Reply via email to