On 04/06/2016 01:05 AM, Emmanuel Lécharny wrote: > So for the record, after a couple of hours working on it tonite, I get > the DeepTrimToLowerNormalizer() working fine, with tests passing. > > I was also able to improve the performances of the beast : from 20 > seconds to normalize 10 000 000 or String like "xs crvtbynU > Jikl7897790", down to 4.3s. I just assumed that most of the time, we > will deal with chars between 0x00 and 0x7F, and wrote a specific > function for that. If we have chars above 0x7F, then an exception is > thrown and we fell back to the complexe process, which will then take > 47s instead of 20s. > > So this is a balance : > - we have an implementation that covers all the chars, and takes 20s for > 10M Strings > - we have an implementation that tries to process the String if chars > are in [0c00, 0x7F] and takes 4.3 s for 10M Strings, but takes 47 > seconds if we have a char outside this range. > > Beside the obvious gain, there is another reason why I wanted to do that > : processing IA5String values will benefit from this separation, and > that covers numerous AttributeTypes (like mail, homeDirectory, > memberUid, krb5principalname, krb5Realmname, and a lot more. > > wdyt ? Going for an average of 20s no matter what, or accepting a huge > penalty when the String does not contain ASCII chars ?
I'd go for the 2nd optimized way. Is the cause of the penalty only the exception-throw-catch? Then maybe it's worth to test if it improves when not throwing an excption but returning a special flag (like null) and checking for that? Kind Regards, Stefan
