Re: [csv] Performance comparison

Benedikt Ritter Mon, 12 Mar 2012 09:35:52 -0700

Am 12. März 2012 17:22 schrieb Emmanuel Bourg <ebo...@apache.org>:
> Le 12/03/2012 17:03, Benedikt Ritter a écrit :
>
>
>> The hole logic behind CSVLexer.nextToken() is very hard to read
>> (IMHO). Maybe a some refactoring would help to make it easier to
>> identify bottle necks?
>
>
> Yes I started investigating in this direction. I filed a few bugs regarding
> the behavior of the escaping that aim at clarifying the parser.
>
> I think the nextToken() method should be broken into smaller methods to help
> the JIT compiler.
>


I would start by eliminating the Token parameter. You could either
create a new token on each method call and return that one instead of
reusing on the gets passed in or you could use a private field
currentToken in CSVLexer. But I think that object creation costs for a
data object like Token can be considered irrelevant (so creating one
in each method call will not hurt us).

> The JIT does some surprising things, I found that even unused code branches
> can have an impact on the performance. For example if simpleTokenLexer() is
> changed to not support escaped characters, the performance improves by 10%
> (the input has no escaped character). And that's not merely because an if
> statement was removed. If I add a System.out.println() in this if block that
> is never called, the performance improves as well.
>
> So any change to the parser will have to be carefully tested. Innocent
> changes can have a significant impact.
>
>
> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [csv] Performance comparison

Reply via email to