[ 
https://issues.apache.org/jira/browse/LUCENE-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262648#comment-15262648
 ] 

Trejkaz commented on LUCENE-7260:
---------------------------------

Meanwhile I threw some hashCode() calls in on the query object, in this sort of 
fashion:
{code}
        int temp = 0;
        for (int i = 0; i < 10; i++)
        {
            long t0 = System.currentTimeMillis();
            Query query = parser2.parse(queryString, "nope");
            long t1 = System.currentTimeMillis();
            temp ^= query.hashCode();
            System.out.println("ignore: " + temp);
            System.out.println("dt: " + (t1-t0));
        }
{code}

I'll take the ignore lines out because it just adds noise. Both tests run 
faster today but it looks like someone updated the JVM we're running against, 
so it could be related to that. These timings are for JDK 8u92. Interesting how 
whatever they did in the JVM has made one of the tests 1/3 faster!

3.6:
{noformat}
dt: 996
dt: 659
dt: 286
dt: 393
dt: 240
dt: 257
dt: 187
dt: 529
dt: 263
dt: 183
{noformat}

5.4:
{noformat}
dt: 20213
dt: 16613
dt: 15311
dt: 14633
dt: 14925
dt: 14571
dt: 14008
dt: 16320
dt: 15211
dt: 14881
{noformat}


> StandardQueryParser is over 100 times slower in v5 compared to v3
> -----------------------------------------------------------------
>
>                 Key: LUCENE-7260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7260
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/queryparser
>    Affects Versions: 5.4.1
>         Environment: Java 8u51
>            Reporter: Trejkaz
>              Labels: performance
>
> The following test code times parsing a large query.
> {code}
> import org.apache.lucene.analysis.KeywordAnalyzer;
> //import org.apache.lucene.analysis.core.KeywordAnalyzer;
> import org.apache.lucene.queryParser.standard.StandardQueryParser;
> //import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
> import org.apache.lucene.search.BooleanQuery;
> public class LargeQueryTest {
>     public static void main(String[] args) throws Exception {
>         BooleanQuery.setMaxClauseCount(50_000);
>         StringBuilder builder = new StringBuilder(50_000*10);
>         builder.append("id:( ");
>         boolean first = true;
>         for (int i = 0; i < 50_000; i++) {
>             if (first) {
>                 first = false;
>             } else {
>                 builder.append(" OR ");
>             }
>             builder.append(String.valueOf(i));
>         }
>         builder.append(" )");
>         String queryString = builder.toString();
>         StandardQueryParser parser2 = new StandardQueryParser(new 
> KeywordAnalyzer());
>         for (int i = 0; i < 10; i++) {
>             long t0 = System.currentTimeMillis();
>             parser2.parse(queryString, "nope");
>             long t1 = System.currentTimeMillis();
>             System.out.println(t1-t0);
>         }
>     }
> }
> {code}
> For Lucene 3.6.2, the timings settle down to 200~300 with the fastest being 
> 207.
> For Lucene 5.4.1, the timings settle down to 20000~30000 with the fastest 
> being 22444.
> So at some point, some change made the query parser 100 times slower. I would 
> suspect that it has something to do with how the list of children is now 
> handled. Every time someone gets the children, it copies the list. Every time 
> someone sets the children, it walks through to detach parent references and 
> then reattaches them all again.
> If it were me, I would probably make these collections immutable so that I 
> didn't have to defensively copy them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to