Am 12. März 2012 17:22 schrieb Emmanuel Bourg <ebo...@apache.org>: > Le 12/03/2012 17:03, Benedikt Ritter a écrit : > > >> The hole logic behind CSVLexer.nextToken() is very hard to read >> (IMHO). Maybe a some refactoring would help to make it easier to >> identify bottle necks? > > > Yes I started investigating in this direction. I filed a few bugs regarding > the behavior of the escaping that aim at clarifying the parser. > > I think the nextToken() method should be broken into smaller methods to help > the JIT compiler. >
I would start by eliminating the Token parameter. You could either create a new token on each method call and return that one instead of reusing on the gets passed in or you could use a private field currentToken in CSVLexer. But I think that object creation costs for a data object like Token can be considered irrelevant (so creating one in each method call will not hurt us). > The JIT does some surprising things, I found that even unused code branches > can have an impact on the performance. For example if simpleTokenLexer() is > changed to not support escaped characters, the performance improves by 10% > (the input has no escaped character). And that's not merely because an if > statement was removed. If I add a System.out.println() in this if block that > is never called, the performance improves as well. > > So any change to the parser will have to be carefully tested. Innocent > changes can have a significant impact. > > > Emmanuel Bourg > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org