dan-s1 commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2077806119
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/test/java/org/apache/nifi/excel/TestExcelRecordReader.java:
##########
@@ -122,20 +112,32 @@ public void testNonExcelFile() {
MalformedRecordException mre =
assertThrows(MalformedRecordException.class, () -> new
ExcelRecordReader(configuration, getInputStream("notExcel.txt"), logger));
final Throwable cause = mre.getCause();
- assertInstanceOf(ReadException.class, cause);
+ assertInstanceOf(ProcessException.class, cause);
}
@Test
- public void testOlderExcelFormatFile() {
- ExcelRecordReaderConfiguration configuration = new
ExcelRecordReaderConfiguration.Builder().build();
- MalformedRecordException mre =
assertThrows(MalformedRecordException.class, () -> new
ExcelRecordReader(configuration, getInputStream("olderFormat.xls"), logger));
- assertTrue(ExceptionUtils.getStackTrace(mre).contains("data appears to
be in the OLE2 Format"));
+ public void testOlderExcelFormatFile() throws MalformedRecordException {
+ final List<RecordField> fields = Arrays.asList(
Review Comment:
```suggestion
final List<RecordField> fields = List.of(
```
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/ExcelReader.java:
##########
@@ -53,13 +53,13 @@
import java.util.List;
import java.util.Map;
-@Tags({"excel", "spreadsheet", "xlsx", "parse", "record", "row", "reader",
"values", "cell"})
+@Tags({"excel", "spreadsheet", "xls", "xlsx", "parse", "record", "row",
"reader", "values", "cell"})
@CapabilityDescription("Parses a Microsoft Excel document returning each row
in each sheet as a separate record. "
+ "This reader allows for inferring a schema from all the required
sheets "
+ "or providing an explicit schema for interpreting the values."
+ "See Controller Service's Usage for further documentation. "
- + "This reader is currently only capable of processing .xlsx "
- + "(XSSF 2007 OOXML file format) Excel documents and not older .xls
(HSSF '97(-2007) file format) documents.")
+ + "This reader is capable of processing .xlsx (XSSF 2007 OOXML file
format) "
+ + "Excel documents and older .xls (HSSF '97(-2007) file format)
documents.")
Review Comment:
```suggestion
+ "This reader is capable of processing both password and non
password protected .xlsx (XSSF 2007 OOXML file format) "
+ " and older .xls (HSSF '97(-2007) file format) Excel documents.")
```
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +44,26 @@ class RowIterator implements Iterator<Row>, Closeable {
private Row currentRow;
RowIterator(final InputStream in, final ExcelRecordReaderConfiguration
configuration, final ComponentLog logger) {
- this.workbook = StreamingReader.builder()
- .rowCacheSize(100)
- .bufferSize(4096)
- .password(configuration.getPassword())
- .setAvoidTempFiles(configuration.isAvoidTempFiles())
- .setReadSharedFormulas(true) // NOTE: If not set to true, then
data with shared formulas fail.
- .open(in);
+ if (configuration.getInputFileType() == InputFileType.XLSX) {
Review Comment:
This comment is regarding the whole if else statements between lines 47-66.
Instead of an if else, an assignment with a switch statement would be clearer
to indicate what code is needed for each Excel type. I was thinking along the
line of this:
```
this.workbook = switch(configuration.getInputFileType()) {
case XLSX -> StreamingReader.builder()
.rowCacheSize(100)
.bufferSize(4096)
.password(configuration.getPassword())
.setAvoidTempFiles(configuration.isAvoidTempFiles())
.setReadSharedFormulas(true) // NOTE: If not set to
true, then data with shared formulas fail.
.open(in);
case XLS -> {
// Providing the password to the HSSFWorkbook is done by
setting a thread variable managed by
// Biff8EncryptionKey. After the workbook is created, the
thread variable can be cleared.
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());
try {
yield new HSSFWorkbook(in);
} catch (final IOException e) {
throw new ProcessException("Failed to open XLS file", e);
} finally {
Biff8EncryptionKey.setCurrentUserPassword(null);
}
}
};
```
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/test/java/org/apache/nifi/excel/TestExcelRecordReader.java:
##########
@@ -318,17 +328,47 @@ void testPasswordProtected() throws Exception {
assertEquals(DATA.length, records.size());
}
+ @Test
+ void testPasswordProtectedOlderExcelFormat() throws Exception {
+ RecordSchema schema = getPasswordProtectedSchema();
+ ExcelRecordReaderConfiguration configuration = new
ExcelRecordReaderConfiguration.Builder()
+ .withSchema(schema)
+ .withPassword(PASSWORD)
+ .withAvoidTempFiles(true)
+ .withInputFileType(InputFileType.XLS)
+ .build();
+
+ InputStream inputStream = new
ByteArrayInputStream(PASSWORD_PROTECTED_OLDER_EXCEL.toByteArray());
+ ExcelRecordReader recordReader = new ExcelRecordReader(configuration,
inputStream, logger);
+ List<Record> records = getRecords(recordReader, false, false);
+
+ assertEquals(DATA.length, records.size());
+ }
Review Comment:
Instead of a new test, combine this test and the previous test line 314-329
into a JUnit 5 org.junit.jupiter.params.ParameterizedTest with an
org.junit.jupiter.params.provider.EnumSource. You can use the enum value in the
configuration and to choose which password protected workbook to use.
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/test/java/org/apache/nifi/excel/TestExcelRecordReader.java:
##########
@@ -318,17 +328,47 @@ void testPasswordProtected() throws Exception {
assertEquals(DATA.length, records.size());
}
+ @Test
+ void testPasswordProtectedOlderExcelFormat() throws Exception {
+ RecordSchema schema = getPasswordProtectedSchema();
+ ExcelRecordReaderConfiguration configuration = new
ExcelRecordReaderConfiguration.Builder()
+ .withSchema(schema)
+ .withPassword(PASSWORD)
+ .withAvoidTempFiles(true)
+ .withInputFileType(InputFileType.XLS)
+ .build();
+
+ InputStream inputStream = new
ByteArrayInputStream(PASSWORD_PROTECTED_OLDER_EXCEL.toByteArray());
+ ExcelRecordReader recordReader = new ExcelRecordReader(configuration,
inputStream, logger);
+ List<Record> records = getRecords(recordReader, false, false);
+
+ assertEquals(DATA.length, records.size());
+ }
+
@Test
void testPasswordProtectedWithoutPassword() {
RecordSchema schema = getPasswordProtectedSchema();
ExcelRecordReaderConfiguration configuration = new
ExcelRecordReaderConfiguration.Builder()
.withSchema(schema)
+ .withInputFileType(InputFileType.XLS)
.build();
InputStream inputStream = new
ByteArrayInputStream(PASSWORD_PROTECTED.toByteArray());
assertThrows(Exception.class, () -> new
ExcelRecordReader(configuration, inputStream, logger));
}
+ @Test
+ void testPasswordProtectedOlderExcelWithoutPassword() {
+ RecordSchema schema = getPasswordProtectedSchema();
+ ExcelRecordReaderConfiguration configuration = new
ExcelRecordReaderConfiguration.Builder()
+ .withSchema(schema)
+ .withInputFileType(InputFileType.XLS)
+ .build();
+
+ InputStream inputStream = new
ByteArrayInputStream(PASSWORD_PROTECTED_OLDER_EXCEL.toByteArray());
+ assertThrows(Exception.class, () -> new
ExcelRecordReader(configuration, inputStream, logger));
+ }
Review Comment:
Same as previous comment make this a Paramaterized test by combining this
test with the one on lines 348-358
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/InputFileType.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.excel;
+
+import org.apache.nifi.components.DescribedValue;
+
+public enum InputFileType implements DescribedValue {
+ XLS("XLS", "An XLS file"),
+ XLSX("XLSX", "An XLSX file");
Review Comment:
```suggestion
XLS("XLS", "HSSF '97(-2007) file format"),
XLSX("XLSX", ""XSSF 2007 OOXML file format"");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]