[cvs] CSVLexer.isEndOfLine(int c) makes assumptions on the line separator of a CSVFormat

Benedikt Ritter Mon, 12 Mar 2012 10:17:28 -0700

Hi,

while looking for potential performance optimization I came across
CSVLexer.isEndOfLine(int c). Here is the source:


    private boolean isEndOfLine(int c) throws IOException {
        // check if we have \r\n...
        if (c == '\r' && in.lookAhead() == '\n') {
            // note: does not change c outside of this method !!
            c = in.read();
        }
        return (c == '\n' || c == '\r');
    }

this method assumes, that a line separator will always be "\r" or
"\r\n". This is true for the pre-configured CSVFormats EXCEL, TDF and
MYSQL. I'm not a pro when it comes to file encoding, but isn't there
the possibility that new encodings will have different line
separators?
If that is the case, isEndOfLine() should somehow use
format.getLineSeparator().
For example the lookAhead only has to be made, if
lineSeperator.length() > 1. This may have a positive impact on the
performance of parsing files with an encoding whose line separator is
only one char long.

Benedikt

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[cvs] CSVLexer.isEndOfLine(int c) makes assumptions on the line separator of a CSVFormat

Reply via email to