[
https://issues.apache.org/jira/browse/NIFI-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timea Barna updated NIFI-10256:
-------------------------------
Status: Patch Available (was: In Progress)
> CSVRecordReader using RFC 4180 CSV format trimming starting and ending double
> quotes
> ------------------------------------------------------------------------------------
>
> Key: NIFI-10256
> URL: https://issues.apache.org/jira/browse/NIFI-10256
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Timea Barna
> Assignee: Timea Barna
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Given an input CSV file:
> scenario,name
> Honors escape beginning," ""John ""PA""RKINSON"""
> problematic,"""John ""PA""RKINSON"""
> honors escape end,"""John ""PA""RKINSON"
> Based on the RFC 4180 spec:
> https://datatracker.ietf.org/doc/html/rfc4180
> " If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> "
> The output should be like this:
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "\"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> However the output is like this"
> [
> { "scenario" : "expected_with_space", "name" : " \"John \"PA\"RKINSON\"" }
> ,
> { "scenario" : "problematic", "name" : "John \"PA\"RKINSON" }
> ,
> { "scenario" : "expected_remove_end_quote", "name" : "\"John \"PA\"RKINSON" }
> ]
> Notice the "problematic" field which initially is """John ""PA""RKINSON"""
> and based on the RFC spec it should have returned this value "\"John
> \"PA\"RKINSON\"" but instead it returns "John \"PA\"RKINSON" missing the
> staring and ending double quotes.
> Notice that the other 2 fields expected_remove_end_quote and
> expected_with_space do work as expected given the RFC spec.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)