[GitHub] [nifi] ottobackwards commented on a change in pull request #4828: NIFI-8232 CSV Parsers optionally allow/reject duplicate header names

GitBox Mon, 22 Feb 2021 08:43:28 -0800


ottobackwards commented on a change in pull request #4828:
URL: https://github.com/apache/nifi/pull/4828#discussion_r580400233




##########
File path: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVHeaderSchemaStrategy.java
##########
@@ -66,9 +66,37 @@ public void testSimple() throws SchemaNotFoundException, 
IOException {
             .allMatch(field -> 
field.getDataType().equals(RecordFieldType.STRING.getDataType())));
     }
 

Review comment:
       which is this testing, Jackson or Apache?  

##########
File path: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVRecordReader.java
##########
@@ -592,6 +585,47 @@ public void testExtraFieldNotInHeader() throws 
IOException, MalformedRecordExcep
         }
     }
 

Review comment:
       You may want to test with multiple duplicates,  id, name, country, id, 
name, country.
   Also, I assume case doesn't matter, but that is an assumption.
   id, name, country, ID, NAME, COUNTRY

##########
File path: 
nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/csv/CSVUtils.java
##########
@@ -136,6 +136,15 @@
         .defaultValue("UTF-8")
         .required(true)
         .build();
+    public static final PropertyDescriptor ALLOW_DUPLICATE_HEADER_NAMES = new 
PropertyDescriptor.Builder()
+        .name("csvutils-allow-duplicate-header-names")
+        .displayName("Allow Duplicate Header Names")

Review comment:
       Maybe this should say what happens if there *are* duplicate headers 
found too

##########
File path: 
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/JacksonCSVRecordReader.java
##########
@@ -108,6 +112,17 @@ public Record nextRecord(final boolean coerceTypes, final 
boolean dropUnknownFie
                     rawFieldNames = schema.getFieldNames();
                 } else {
                     rawFieldNames = Arrays.asList(csvRecord);
+                    if (rawFieldNames.size() > schema.getFieldCount() && 
!allowDuplicateHeaderNames) {
+                        final Set<String> deDupe = new 
HashSet<>(schema.getFieldCount());

Review comment:
       So, if I have multiple duplicate names, then I'm going to have to 
iterate through these errors, one by one, field by field.
   Have you given thought to tracking the duplicates, and then throwing if 
there are any, and include all the duplicate fields in that exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nifi] ottobackwards commented on a change in pull request #4828: NIFI-8232 CSV Parsers optionally allow/reject duplicate header names

Reply via email to