Le 06/05/16 07:20, Emmanuel Lécharny a écrit :
> Le 05/05/16 22:56, Emmanuel Lécharny a écrit :
>> One more thing, I have run some performance tests on DN creation for the
>> old code and the new :
>>
>> Old DN parsing for 10 000 000 DN creations :
>> --------------------------------------------
>> delta new 1 RDN  :  5.946s  (dc=example<i>)
>> delta new 2 RDNs :  9.738s  (dc=example<i>,dc=com)
>> delta new 3 RDNs : 12.324s  (uid=<i>,dc=example,dc=com)
>> delta new 4 RDNs : 16.438s  (uid=<i>,ou=people,dc=example,dc=com)
>>
>> New DN parsing for 10 000 000 DN creations :
>> --------------------------------------------
>> delta new 1 RDN  :  3.491s (70% faster)
>> delta new 2 RDNs :  7.206s (35% faster)
>> delta new 3 RDNs :  8.489s (45% faster)
>> delta new 4 RDNs : 12.654s (30% faster)
>>
>>
>> I would assume a global 30% speedup, average. I haven't tested yet the
>> complex parser, but this is very encouraging !
>>
> Complex parse is WAY WAY slower :/
>
> Still it's 15% faster in the new code. Antlr is killing us here. We are
> talking of 45 000 DN parsed per second, compared to roughly 2.8 million
> per second ( 60 times slower...) !
The reason is that the antlr lexer is really picky. A rule like :

NUMERICOID_OR_ALPHA_OR_DIGIT
    : ( NUMERICOID ) => NUMERICOID { $setType(NUMERICOID); }
    | ( DIGIT ) => DIGIT { $setType(DIGIT); }
    | ( ALPHA ) => ALPHA { $setType(ALPHA); }
    ;
protected NUMERICOID : ( "oid." )? NUMBER ( DOT NUMBER )+ ;
protected DOT: '.' ;
protected NUMBER: DIGIT | ( LDIGIT ( DIGIT )+ ) ;
protected LDIGIT : '1'..'9' ;
protected DIGIT : '0'..'9' ;
protected ALPHA : 'a'..'z' ;

is killing the performances, but I don't see how to get this fixed.
Tokenization is a clear issue in antlr :/

Reply via email to