[
https://issues.apache.org/jira/browse/SANDBOX-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224219#comment-13224219
]
Emmanuel Bourg commented on SANDBOX-291:
----------------------------------------
There are a lot of good suggestions, I moved the details in the description of
the issue so we can update it as we make progress integrating the changes.
> Lots of possible changes
> ------------------------
>
> Key: SANDBOX-291
> URL: https://issues.apache.org/jira/browse/SANDBOX-291
> Project: Commons Sandbox
> Issue Type: Improvement
> Components: CSV
> Affects Versions: Nightly Builds
> Reporter: Bob Smith
> Priority: Minor
> Attachments: src.zip
>
>
> I made a lot of changes to pretty much all of the classes in the csv package.
> I thought it would be better to put all of the the changes here in one
> issue, but feel free to only take the parts you like (if any). Hopefully if
> nothing else the test cases will be useful to you.
> I'll try to list most of the changes here, but I'm sure I'm forgetting some.
> This should include all of the big changes at least. I focused mostly on the
> parser, but I also made a few changes to the printer classes (although I
> don't think I added any new test cases there).
> h3. General Changes
> - Changed all class names with "CSV" in them to use "Csv". This is how it
> appears in the commons-lang "escapeCsv" methods and I think it's easier to
> read the class name when acronyms are not in all upper case.
> - Formatted the code. I used Eclipse with a version of the Java formatting
> style that uses spaces instead of tabs and with a few other small changes to
> try to make it more similar to the style of this code. The formatting was
> inconsistent before (sometimes 2 space indent, sometimes 4) which made it
> really hard to work on.
> - Removed all deprecated methods/constructors
> - Made all public classes final. If there is ever a need to create subclasses
> of them then this could be changed, but I think it would be better to at
> least start them as final (since once they are released as non-final it's
> hard to go back).
> - A few bug fixes (and test cases for them)
> h3. CsvParser
> - There were a few bugs for special cases, so I made as small of changes as I
> could to the parser code to fix these.
> - Added a lot of test cases. I created a test case for all bugs that I found,
> so even if you don't use my changes to this class you should be able to use
> the test cases to find all of the same bugs.
> - Added a close method.
> - Renamed the nextValue method to getValue (so it is more consistent with the
> getAll and getLine method names). I think I would prefer to use a different
> method name prefix for all three of these (like "readAll") since I wouldn't
> normally expect a "get" method to have side effects, but I didn't want to
> just change the names of the most used methods.
> - Changed the getLineNumber method to return the correct line number when
> there are multi-line values.
> - Moved all of the lexer methods into an inner CsvLexer class that is
> completely independent of the CsvParser class. The methods were already
> separated out, so it wasn't a very big change. I also moved the lexer test
> cases into a new CsvLexerTest class.
> - Got rid of the interpreting unicode escape options. This doesn't really
> have anything to do with parsing a CSV file so I think it should be left up
> to the user of the class to implement this if needed. As an example, I made a
> CsvParserUnicodeEscapeTest class that uses the code from the lexer in a
> Reader subclass. One nice thing is that with this implementation, the
> interpretted values can be used as the delimiter, encapsulator, etc.
> - Got rid of the "escape" option for the same reason as the unicode escape
> option. I replaced it with an encapsulator escape option that is only used as
> an escape operator on the encapsulator character.
> h3. ExtendedBufferedReader
> - Greatly simplified this class. I removed all the methods that weren't being
> used (including keeping track of the line number) and changed the lookahead
> option to use the BufferedReader mark and reset methods.
> h3. CsvStrategy
> - I split this class into three classes: an abstract base class
> (CsvStrategy), a parser-specific version (CsvParseStrategy) and a
> printer-specific version (CsvPrintStrategy). I didn't like that the strategy
> was used for both parsing and printing even though some of the values only
> applied to parsing (and there could be values that apply only to printing as
> well).
> - Made this class immutable (as described in SANDBOX-279)
> - Changed the whitespace options to not ignore whitespace by default. This is
> what the document at http://www.rfc-editor.org/rfc/rfc4180.txt recommends for
> the CSV format, so I think it should be like that by default. I added an
> "IGNORE_WHITESPACE_STRATEGY" field that works the same as the old defaults.
> - Removed the interpretUnicodeEscape option and replaced the escape field
> with an encapsuatorEscape field (as described in the CsvParser change
> details).
> - Added an ignoreEncapsulationTrailingCharacters field. This is used to
> either ignore or append characters that are after an encapsulated value.
> Previously an IOException was being thrown here, which I don't think is ever
> a good idea.
> - Added some restrictions to prevent the values from being things that would
> break the parser. This includes using a line break for anything or having
> equal two values (other than the encapsulator and encapsulator escape).
> h3. CsvPrinter
> - I changed this to use a modified version of the commons-lang escapeCsv
> method (I hope it is ok to copy a small amount of code from one commons
> project to another?). The escaping is a little different (and simpler) that
> the old version, but I think the commons-lang version seems to be the best
> way to do it.
> - I added an option to the constructor to allow disabling auto-flushing of
> the output stream (similar to what is in the PrintStream class). I also
> reduced the number of times the output is flushed when using the print method
> that take array input.
> h3. CharBuffer
> - I didn't really make any changes other than to make it a non-public class.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira