[
https://issues.apache.org/jira/browse/CSV-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053916#comment-17053916
]
Chen commented on CSV-150:
--------------------------
Unicode with BOM will generate '\ufffe' , in normal case we can use Common IO
to filter the inpustream.
but in this case '\ufffe' comes out not the first bytes, so the problem is
should we survive from not clean data input ?
yet I think that using magic char to map null is not so good
> Escaping is not disableable
> ---------------------------
>
> Key: CSV-150
> URL: https://issues.apache.org/jira/browse/CSV-150
> Project: Commons CSV
> Issue Type: Bug
> Components: Parser
> Affects Versions: 1.1
> Reporter: Georg Tsakumagos
> Priority: Major
> Fix For: Review
>
> Attachments: CSV-150.patch
>
>
> h6. Problem
> If escaping is disabled the Lexer maps the NULL Character to the magic char
> '\ufffe'. I currently hit this char randomly with data. This leads to a
> RuntimeException inside of
> org.apache.commons.csv.Lexer.parseEncapsulatedToken(Token) with the message
> "invalid char between encapsulated token and delimiter".
> h6. Solution
> Don't map the Character object and use it.
> {code:title=Lexer.java|borderStyle=solid}
> Lexer(final CSVFormat format, final ExtendedBufferedReader reader) {
> this.reader = reader;
> this.delimiter = format.getDelimiter();
> this.escape = format.getEscapeCharacter();
> .
> .
> .
> }
> boolean isEscape(final int ch) {
> return null != this.escape && escape.charValue() == ch;
> }
> {code}
> h6. Hint
> This pattern is used in other cases to. It seem to be a systematic error.
> This cases should be refactored also.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)