ottobackwards commented on a change in pull request #4828:
URL: https://github.com/apache/nifi/pull/4828#discussion_r580400233
##########
File path:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVHeaderSchemaStrategy.java
##########
@@ -66,9 +66,37 @@ public void testSimple() throws SchemaNotFoundException,
IOException {
.allMatch(field ->
field.getDataType().equals(RecordFieldType.STRING.getDataType())));
}
Review comment:
which is this testing, Jackson or Apache?
##########
File path:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/csv/TestCSVRecordReader.java
##########
@@ -592,6 +585,47 @@ public void testExtraFieldNotInHeader() throws
IOException, MalformedRecordExcep
}
}
Review comment:
You may want to test with multiple duplicates, id, name, country, id,
name, country.
Also, I assume case doesn't matter, but that is an assumption.
id, name, country, ID, NAME, COUNTRY
##########
File path:
nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/csv/CSVUtils.java
##########
@@ -136,6 +136,15 @@
.defaultValue("UTF-8")
.required(true)
.build();
+ public static final PropertyDescriptor ALLOW_DUPLICATE_HEADER_NAMES = new
PropertyDescriptor.Builder()
+ .name("csvutils-allow-duplicate-header-names")
+ .displayName("Allow Duplicate Header Names")
Review comment:
Maybe this should say what happens if there *are* duplicate headers
found too
##########
File path:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/JacksonCSVRecordReader.java
##########
@@ -108,6 +112,17 @@ public Record nextRecord(final boolean coerceTypes, final
boolean dropUnknownFie
rawFieldNames = schema.getFieldNames();
} else {
rawFieldNames = Arrays.asList(csvRecord);
+ if (rawFieldNames.size() > schema.getFieldCount() &&
!allowDuplicateHeaderNames) {
+ final Set<String> deDupe = new
HashSet<>(schema.getFieldCount());
Review comment:
So, if I have multiple duplicate names, then I'm going to have to
iterate through these errors, one by one, field by field.
Have you given thought to tracking the duplicates, and then throwing if
there are any, and include all the duplicate fields in that exception?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]