exceptionfactory commented on code in PR #9975:
URL: https://github.com/apache/nifi/pull/9975#discussion_r2132505128
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/ExcelHeaderSchemaStrategy.java:
##########
@@ -126,8 +130,26 @@ private List<String> getFieldNames(int firstRowIndex, Row
row) throws SchemaNotF
fieldNames.add(fieldName);
}
}
+ final List<String> renamedDuplicateFieldNames =
renameDuplicateFieldNames(fieldNames);
- return fieldNames;
+ return renamedDuplicateFieldNames;
+ }
+
+ private List<String> renameDuplicateFieldNames(List<String> fieldNames) {
+ final Map<String, Integer> fieldNameCounts = new HashMap<>();
+ final List<String> renamedDuplicateFieldNames = new ArrayList<>();
+
+ for (String fieldName : fieldNames) {
+ if (fieldNameCounts.containsKey(fieldName)) {
+ int count = fieldNameCounts.get(fieldName) + 1;
+ fieldNameCounts.put(fieldName, count);
+ renamedDuplicateFieldNames.add(fieldName + "_" + count);
Review Comment:
```suggestion
final int count = fieldNameCounts.get(fieldName);
renamedDuplicateFieldNames.add("%s_%d".formatted(fieldName,
count));
fieldNameCounts.put(fieldName, count + 1);
```
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/ExcelHeaderSchemaStrategy.java:
##########
@@ -126,8 +130,26 @@ private List<String> getFieldNames(int firstRowIndex, Row
row) throws SchemaNotF
fieldNames.add(fieldName);
}
}
+ final List<String> renamedDuplicateFieldNames =
renameDuplicateFieldNames(fieldNames);
- return fieldNames;
+ return renamedDuplicateFieldNames;
+ }
+
+ private List<String> renameDuplicateFieldNames(List<String> fieldNames) {
Review Comment:
```suggestion
private List<String> renameDuplicateFieldNames(final List<String>
fieldNames) {
```
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/ExcelHeaderSchemaStrategy.java:
##########
@@ -47,8 +48,11 @@ public class ExcelHeaderSchemaStrategy implements
SchemaAccessStrategy {
static final int NUM_ROWS_TO_DETERMINE_TYPES = 10; // NOTE: This number is
arbitrary.
static final AllowableValue USE_STARTING_ROW = new AllowableValue("Use
Starting Row", "Use Starting Row",
"The configured first row of the Excel file is a header line that
contains the names of the columns. The schema will be derived by using the "
- + "column names in the header of the first sheet and the
following " + NUM_ROWS_TO_DETERMINE_TYPES + " rows to determine the type(s) of
each column " +
- "while the configured header rows of subsequent sheets
are skipped.");
+ + "column names in the header of the first sheet and the
following " + NUM_ROWS_TO_DETERMINE_TYPES + " rows to determine the type(s) of
each column "
+ + "while the configured header rows of subsequent sheets
are skipped. "
+ + "NOTE: If there are duplicate column names then each
subsequent duplicate column name is given a one up number. "
+ + "For example, column names \"Frequency\", \"Intervals\",
\"Frequency\" \"Name\", \"Frequency\", \"Intervals\" will be "
+ + "changed to \"Frequency\", \"Intervals\",
\"Frequency_2\" \"Name\", \"Frequency_3\", \"Intervals_2\".");
Review Comment:
Recommend simplifying the example to focus on a single column for clarity:
```suggestion
+ "For example, column names \"Name\", \"Name\" will be "
+ "changed to \"Name\", \"Name_1\"");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]