[
https://issues.apache.org/jira/browse/NIFI-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292819#comment-16292819
]
ASF GitHub Bot commented on NIFI-4496:
--------------------------------------
Github user mattyb149 commented on the issue:
https://github.com/apache/nifi/pull/2245
@jdye64 I think I fixed the issue you were seeing. We have to do most of
the schema resolution/management manually, Jackson's methods for handling that
don't seem to work for what we need. So I removed the setting of column names
on the parser, having the column names changed the parser to want an actual
array with [] surrounding the line (weird, right?). Then for files without
headers, I needed to make sure we used the schema field names, so I had to
adjust the logic where "rawFieldNames" is generated. Mind taking a look at
this latest version? Please and thanks!
> Improve performance of CSVReader
> --------------------------------
>
> Key: NIFI-4496
> URL: https://issues.apache.org/jira/browse/NIFI-4496
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
>
> During some throughput testing, it was noted that the CSVReader was not as
> fast as desired, processing less than 50k records per second. A look at [this
> benchmark|https://github.com/uniVocity/csv-parsers-comparison] implies that
> the Apache Commons CSV parser (used by CSVReader) is quite slow compared to
> others.
> From that benchmark it appears that CSVReader could be enhanced by using a
> different CSV parser under the hood. Perhaps Jackson is the best choice, as
> it is fast when values are quoted, and is a mature and maintained codebase.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)