[
https://issues.apache.org/jira/browse/SANDBOX-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bob Smith updated SANDBOX-291:
------------------------------
Attachment: src.zip
I'll try to list most of the changes here, but I'm sure I'm forgetting some.
This should include all of the big changes at least. I focused mostly on the
parser, but I also made a few changes to the printer classes (although I don't
think I added any new test cases there).
General Changes:
- Changed all class names with "CSV" in them to use "Csv". This is how it
appears in the commons-lang "escapeCsv" methods and I think it's easier to read
the class name when acronyms are not in all upper case.
- Formatted the code. I used Eclipse with a version of the Java formatting
style that uses spaces instead of tabs and with a few other small changes to
try to make it more similar to the style of this code. The formatting was
inconsistent before (sometimes 2 space indent, sometimes 4) which made it
really hard to work on.
- Removed all deprecated methods/constructors
- Made all public classes final. If there is ever a need to create subclasses
of them then this could be changed, but I think it would be better to at least
start them as final (since once they are released as non-final it's hard to go
back).
- A few bug fixes (and test cases for them)
-----
CsvParser:
- There were a few bugs for special cases, so I made as small of changes as I
could to the parser code to fix these.
- Added a lot of test cases. I created a test case for all bugs that I found,
so even if you don't use my changes to this class you should be able to use the
test cases to find all of the same bugs.
- Added a close method.
- Renamed the nextValue method to getValue (so it is more consistent with the
getAll and getLine method names). I think I would prefer to use a different
method name prefix for all three of these (like "readAll") since I wouldn't
normally expect a "get" method to have side effects, but I didn't want to just
change the names of the most used methods.
- Changed the getLineNumber method to return the correct line number when there
are multi-line values.
- Moved all of the lexer methods into an inner CsvLexer class that is
completely independent of the CsvParser class. The methods were already
separated out, so it wasn't a very big change. I also moved the lexer test
cases into a new CsvLexerTest class.
- Got rid of the interpreting unicode escape options. This doesn't really have
anything to do with parsing a CSV file so I think it should be left up to the
user of the class to implement this if needed. As an example, I made a
CsvParserUnicodeEscapeTest class that uses the code from the lexer in a Reader
subclass. One nice thing is that with this implementation, the interpretted
values can be used as the delimiter, encapsulator, etc.
- Got rid of the "escape" option for the same reason as the unicode escape
option. I replaced it with an encapsulator escape option that is only used as
an escape operator on the encapsulator character.
-----
ExtendedBufferedReader
- Greatly simplified this class. I removed all the methods that weren't being
used (including keeping track of the line number) and changed the lookahead
option to use the BufferedReader mark and reset methods.
----
CsvStrategy:
- I split this class into three classes: an abstract base class (CsvStrategy),
a parser-specific version (CsvParseStrategy) and a printer-specific version
(CsvPrintStrategy). I didn't like that the strategy was used for both parsing
and printing even though some of the values only applied to parsing (and there
could be values that apply only to printing as well).
- Made this class immutable (as described in SANDBOX-279)
- Changed the whitespace options to not ignore whitespace by default. This is
what the document at http://www.rfc-editor.org/rfc/rfc4180.txt recommends for
the CSV format, so I think it should be like that by default. I added an
"IGNORE_WHITESPACE_STRATEGY" field that works the same as the old defaults.
- Removed the interpretUnicodeEscape option and replaced the escape field with
an encapsuatorEscape field (as described in the CsvParser change details).
- Added an ignoreEncapsulationTrailingCharacters field. This is used to either
ignore or append characters that are after an encapsulated value. Previously
an IOException was being thrown here, which I don't think is ever a good idea.
- Added some restrictions to prevent the values from being things that would
break the parser. This includes using a line break for anything or having
equal two values (other than the encapsulator and encapsulator escape).
-----
CsvPrinter:
- I changed this to use a modified version of the commons-lang escapeCsv method
(I hope it is ok to copy a small amount of code from one commons project to
another?). The escaping is a little different (and simpler) that the old
version, but I think the commons-lang version seems to be the best way to do it.
- I added an option to the constructor to allow disabling auto-flushing of the
output stream (similar to what is in the PrintStream class). I also reduced
the number of times the output is flushed when using the print method that take
array input.
-----
CharBuffer:
- I didn't really make any changes other than to make it a non-public class.
> Lots of possible changes
> ------------------------
>
> Key: SANDBOX-291
> URL: https://issues.apache.org/jira/browse/SANDBOX-291
> Project: Commons Sandbox
> Issue Type: Improvement
> Components: CSV
> Affects Versions: Nightly Builds
> Reporter: Bob Smith
> Priority: Minor
> Attachments: src.zip
>
>
> I made a lot of changes to pretty much all of the classes in the csv package.
> I thought it would be better to put all of the the changes here in one
> issue, but feel free to only take the parts you like (if any). Hopefully if
> nothing else the test cases will be useful to you.
> I'll attach the changes and add more details in the next post.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.