Le 06/05/16 07:20, Emmanuel Lécharny a écrit : > Le 05/05/16 22:56, Emmanuel Lécharny a écrit : >> One more thing, I have run some performance tests on DN creation for the >> old code and the new : >> >> Old DN parsing for 10 000 000 DN creations : >> -------------------------------------------- >> delta new 1 RDN : 5.946s (dc=example<i>) >> delta new 2 RDNs : 9.738s (dc=example<i>,dc=com) >> delta new 3 RDNs : 12.324s (uid=<i>,dc=example,dc=com) >> delta new 4 RDNs : 16.438s (uid=<i>,ou=people,dc=example,dc=com) >> >> New DN parsing for 10 000 000 DN creations : >> -------------------------------------------- >> delta new 1 RDN : 3.491s (70% faster) >> delta new 2 RDNs : 7.206s (35% faster) >> delta new 3 RDNs : 8.489s (45% faster) >> delta new 4 RDNs : 12.654s (30% faster) >> >> >> I would assume a global 30% speedup, average. I haven't tested yet the >> complex parser, but this is very encouraging ! >> > Complex parse is WAY WAY slower :/ > > Still it's 15% faster in the new code. Antlr is killing us here. We are > talking of 45 000 DN parsed per second, compared to roughly 2.8 million > per second ( 60 times slower...) ! The reason is that the antlr lexer is really picky. A rule like :
NUMERICOID_OR_ALPHA_OR_DIGIT : ( NUMERICOID ) => NUMERICOID { $setType(NUMERICOID); } | ( DIGIT ) => DIGIT { $setType(DIGIT); } | ( ALPHA ) => ALPHA { $setType(ALPHA); } ; protected NUMERICOID : ( "oid." )? NUMBER ( DOT NUMBER )+ ; protected DOT: '.' ; protected NUMBER: DIGIT | ( LDIGIT ( DIGIT )+ ) ; protected LDIGIT : '1'..'9' ; protected DIGIT : '0'..'9' ; protected ALPHA : 'a'..'z' ; is killing the performances, but I don't see how to get this fixed. Tokenization is a clear issue in antlr :/