Hi, while looking for potential performance optimization I came across CSVLexer.isEndOfLine(int c). Here is the source:
private boolean isEndOfLine(int c) throws IOException { // check if we have \r\n... if (c == '\r' && in.lookAhead() == '\n') { // note: does not change c outside of this method !! c = in.read(); } return (c == '\n' || c == '\r'); } this method assumes, that a line separator will always be "\r" or "\r\n". This is true for the pre-configured CSVFormats EXCEL, TDF and MYSQL. I'm not a pro when it comes to file encoding, but isn't there the possibility that new encodings will have different line separators? If that is the case, isEndOfLine() should somehow use format.getLineSeparator(). For example the lookAhead only has to be made, if lineSeperator.length() > 1. This may have a positive impact on the performance of parsing files with an encoding whose line separator is only one char long. Benedikt --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org