exceptionfactory commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2052503679
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +51,26 @@ class RowIterator implements Iterator<Row>, Closeable {
private Row currentRow;
RowIterator(final InputStream in, final ExcelRecordReaderConfiguration
configuration, final ComponentLog logger) {
- this.workbook = StreamingReader.builder()
- .rowCacheSize(100)
- .bufferSize(4096)
- .password(configuration.getPassword())
- .setAvoidTempFiles(configuration.isAvoidTempFiles())
- .setReadSharedFormulas(true) // NOTE: If not set to true, then
data with shared formulas fail.
- .open(in);
+ if (isXSSFExcelFile(in, configuration.getPassword())) {
+ this.workbook = StreamingReader.builder()
+ .rowCacheSize(100)
+ .bufferSize(4096)
+ .password(configuration.getPassword())
+ .setAvoidTempFiles(configuration.isAvoidTempFiles())
+ .setReadSharedFormulas(true) // NOTE: If not set to true,
then data with shared formulas fail.
+ .open(in);
+ } else {
+ // Providing the password to the HSSFWorkbook is done by setting a
thread variable managed by
+ // Biff8EncryptionKey. After the workbook is created, the thread
variable can be cleared.
+
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());
Review Comment:
This is problematic for multi-threaded use of ExcelReader, which is very
possible. Under the circumstances, it seems best to avoid attempting to support
encrypted XLS files right now. This will likely throw exceptions when
`HSSFWorkbook` attempts to read, but that is acceptable under the
circumstances, as it is a limitation of the library.
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -129,4 +151,37 @@ private boolean hasExhaustedRows() {
}
return exhausted;
}
+
+ private boolean isXSSFExcelFile(InputStream in, final String password) {
+ final FileMagic fileType;
+ try {
+ fileType = FileMagic.valueOf(in);
+ } catch (final IOException e) {
+ throw new ProcessException("Failed to determine file type", e);
+ }
+
+ if (fileType == FileMagic.OOXML) {
+ // Unencrypted XLSX file
+ return true;
+ }
+ if (fileType != FileMagic.OLE2) {
+ // Not an Excel file
+ return false;
+ }
+ if (password == null) {
+ // If there is no password on the OLE2 file, then it's an XLS
file. Otherwise, the type must be further
+ // investigated.
+ return false;
+ }
Review Comment:
It would be helpful to refactor this method to use a single return.
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -129,4 +151,37 @@ private boolean hasExhaustedRows() {
}
return exhausted;
}
+
+ private boolean isXSSFExcelFile(InputStream in, final String password) {
+ final FileMagic fileType;
+ try {
+ fileType = FileMagic.valueOf(in);
+ } catch (final IOException e) {
+ throw new ProcessException("Failed to determine file type", e);
+ }
+
+ if (fileType == FileMagic.OOXML) {
+ // Unencrypted XLSX file
+ return true;
+ }
+ if (fileType != FileMagic.OLE2) {
+ // Not an Excel file
+ return false;
+ }
+ if (password == null) {
+ // If there is no password on the OLE2 file, then it's an XLS
file. Otherwise, the type must be further
+ // investigated.
+ return false;
+ }
+
+ DirectoryNode root;
+ try {
+ root = new POIFSFileSystem(new
NonCloseableInputStream(in)).getRoot();
+ in.reset();
Review Comment:
It looks like the `reset()` call should be moved to a `finally` block in
case of exceptions.
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -129,4 +151,37 @@ private boolean hasExhaustedRows() {
}
return exhausted;
}
+
+ private boolean isXSSFExcelFile(InputStream in, final String password) {
Review Comment:
Minor naming recommend:
```suggestion
private boolean isXmlFormat(final InputStream in, final String password)
{
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]