[ 
https://issues.apache.org/jira/browse/NIFI-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576848#comment-17576848
 ] 

ASF subversion and git services commented on NIFI-10256:
--------------------------------------------------------

Commit 26829e5c350766e770ccb3a7f8d3149dd5924409 in nifi's branch 
refs/heads/main from Timea Barna
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=26829e5c35 ]

NIFI-10256 CSVRecordReader using RFC 4180 CSV format trimming starting and 
ending double quotes

NIFI-10256 Addresseng review comments, adding extra description, removing 
unneccessary static import, creating extra constructor

NIFI-10256 Refactoring CSVRecordReader

NIFI-10256 Addresseng review comments, adding validator

Signed-off-by: Bence Simon <[email protected]>
This closes #6234


> CSVRecordReader using RFC 4180 CSV format trimming starting and ending double 
> quotes
> ------------------------------------------------------------------------------------
>
>                 Key: NIFI-10256
>                 URL: https://issues.apache.org/jira/browse/NIFI-10256
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Timea Barna
>            Assignee: Timea Barna
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Given an input CSV file:
> scenario,name
> Honors escape beginning," ""John ""PA""RKINSON"""
> problematic,"""John ""PA""RKINSON"""
> honors escape end,"""John ""PA""RKINSON"
> Based on the RFC 4180 spec:
> https://datatracker.ietf.org/doc/html/rfc4180
> " If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> "
> The output should be like this:
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> However the output is like this"
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> Notice the "problematic" field which initially is """John ""PA""RKINSON""" 
> and based on the RFC spec it should have returned this value "\"John 
> \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the 
> staring and ending double quotes.
> Notice that the other 2 fields expected_remove_end_quote and 
> expected_with_space do work as expected given the RFC spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to