[
https://issues.apache.org/jira/browse/NIFI-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234547#comment-16234547
]
ASF GitHub Bot commented on NIFI-4496:
--------------------------------------
Github user andrewmlim commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2245#discussion_r148347696
--- Diff:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVReader.java
---
@@ -54,6 +54,26 @@
"The first non-comment line of the CSV file is a header line that
contains the names of the columns. The schema will be derived by using the "
+ "column names in the header and assuming that all columns
are of type String.");
+ // CSV parsers
+ public static final AllowableValue APACHE_COMMONS_CSV = new
AllowableValue("commons-csv", "Apache Commons CSV",
+ "The CSV parser implementation from the Apache Commons CSV
library.");
+
+ public static final AllowableValue JACKSON_CSV = new
AllowableValue("jackson-csv", "Jackson CSV",
+ "The CSV parser implementation from the Jackson Dataformats
library");
+
+
+ public static final PropertyDescriptor CSV_PARSER = new
PropertyDescriptor.Builder()
+ .name("csv-reader-csv-parser")
+ .displayName("CSV Parser")
+ .description("Specifies which parser to use to read CSV
records. NOTE: Different parsers may support different subsets of
functionality, "
+ + "and/or exhibit different levels of performance.")
--- End diff --
Suggest changing the NOTE to:
Different parsers may support different subsets of functionality and may
also exhibit different levels of performance.
> Improve performance of CSVReader
> --------------------------------
>
> Key: NIFI-4496
> URL: https://issues.apache.org/jira/browse/NIFI-4496
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Priority: Major
>
> During some throughput testing, it was noted that the CSVReader was not as
> fast as desired, processing less than 50k records per second. A look at [this
> benchmark|https://github.com/uniVocity/csv-parsers-comparison] implies that
> the Apache Commons CSV parser (used by CSVReader) is quite slow compared to
> others.
> From that benchmark it appears that CSVReader could be enhanced by using a
> different CSV parser under the hood. Perhaps Jackson is the best choice, as
> it is fast when values are quoted, and is a mature and maintained codebase.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)