[
https://issues.apache.org/jira/browse/NIFI-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576848#comment-17576848
]
ASF subversion and git services commented on NIFI-10256:
--------------------------------------------------------
Commit 26829e5c350766e770ccb3a7f8d3149dd5924409 in nifi's branch
refs/heads/main from Timea Barna
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=26829e5c35 ]
NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming starting and
ending double quotes
NIFI-10256 Addresseng review comments, adding extra description, removing
unneccessary static import, creating extra constructor
NIFI-10256 Refactoring CSVRecordReader
NIFI-10256 Addresseng review comments, adding validator
Signed-off-by: Bence Simon <[email protected]>
This closes #6234
> CSVRecordReader using RFC 4180 CSV format trimming starting and ending double
> quotes
> ------------------------------------------------------------------------------------
>
> Key: NIFI-10256
> URL: https://issues.apache.org/jira/browse/NIFI-10256
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Timea Barna
> Assignee: Timea Barna
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Given an input CSV file:
> scenario,name
> Honors escape beginning," ""John ""PA""RKINSON"""
> problematic,"""John ""PA""RKINSON"""
> honors escape end,"""John ""PA""RKINSON"
> Based on the RFC 4180 spec:
> https://datatracker.ietf.org/doc/html/rfc4180
> " If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> "
> The output should be like this:
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> However the output is like this"
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> Notice the "problematic" field which initially is """John ""PA""RKINSON"""
> and based on the RFC spec it should have returned this value "\"John
> \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the
> staring and ending double quotes.
> Notice that the other 2 fields expected_remove_end_quote and
> expected_with_space do work as expected given the RFC spec.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)