[
https://issues.apache.org/jira/browse/CSV-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230965#comment-13230965
]
Emmanuel Bourg commented on CSV-67:
-----------------------------------
Good point but I'm not sure it actually happens. So far the only application I
have found supporting unicode escapes is HSQLDB. It can read them but doesn't
write them (I checked HSQL 1.8, I'll look at 2.x). I believe these unicode
escapes are typically created by a program like native2ascii which converts
only non ascii characters, so I believe the line separators are safe.
I agree on removing the unicode escape setting from CSVFormat. I would prefer
submitting the reader to [io] than making it public in [csv] though.
> UnicodeUnescapeReader should not be applied before parsing
> ----------------------------------------------------------
>
> Key: CSV-67
> URL: https://issues.apache.org/jira/browse/CSV-67
> Project: Commons CSV
> Issue Type: Bug
> Reporter: Sebb
>
> The UnicodeEscapeReader is currently applied before the input file is parsed.
> This means that unicode escapes are treated differently from other escapes.
> For example, the sequence <esc>r<esc>n is not treated as a new-line for the
> purpose of recognising the end of a record, yet \o000D\u000A is converted to
> CRLF and would terminate the record (unless embedded in a quoted string).
> The unicode escape processing (if selected) should occur as part of the
> parsing, just as for ordinary escape processing.
> The class can be made public so the user can wrap the input if required; this
> preserves the existing functionality should it be required, so there is no
> need to introduce another setting.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira