[
https://issues.apache.org/jira/browse/NIFI-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004281#comment-17004281
]
Pierre Villard commented on NIFI-6967:
--------------------------------------
As far as I can see in the code the Apache Commons CSV Parser will always be
used to infer the schema from the CSV, but then the correct record reader
parser will be selected based on the controller service configuration. Thing
is... we are parsing the records to infer the types of the fields. In your
situation I'd change the Schema Access Strategy from "Infer Schema" to "Use
String Fields From Header".
The next hing to know (and documentation would probably need to be improved) is
that you need to configure CSV Format as "Custom" to actually tell the
processor to use the properties about separator, quote character, escape
character, etc. Because the tab-delimited format will take the default ones.
By changing the configuration as described and by defining quote character and
escape characters with characters you're sure to never see in your data (like
weird symbols), I got your example working.
I don't know if we should change the way we infer the schema - I don't have a
strong opinion about this.
I hope the above explanations provide some help for your use case.
> Choosing Jackson Parser for CSVReader Doesn't Actually Choose It
> ----------------------------------------------------------------
>
> Key: NIFI-6967
> URL: https://issues.apache.org/jira/browse/NIFI-6967
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Shawn Weeks
> Priority: Minor
> Attachments: Jackson_Bug.xml, nifi_jackson_log.txt
>
>
> While looking at NIFI-6966 I discovered that choosing Jackson CSV as the CSV
> Parser in CSVReader doesn't actually use Jackson's parser. No idea why. I've
> attached an example with the log I see.
> NiFi Version Information
> 1.10.0
> 10/29/2019 09:56:52 CDT
> Tagged nifi-1.10.0-RC3
--
This message was sent by Atlassian Jira
(v8.3.4#803005)