exceptionfactory commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2052503679


##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +51,26 @@ class RowIterator implements Iterator<Row>, Closeable {
     private Row currentRow;
 
     RowIterator(final InputStream in, final ExcelRecordReaderConfiguration 
configuration, final ComponentLog logger) {
-        this.workbook = StreamingReader.builder()
-                .rowCacheSize(100)
-                .bufferSize(4096)
-                .password(configuration.getPassword())
-                .setAvoidTempFiles(configuration.isAvoidTempFiles())
-                .setReadSharedFormulas(true) // NOTE: If not set to true, then 
data with shared formulas fail.
-                .open(in);
+        if (isXSSFExcelFile(in, configuration.getPassword())) {
+            this.workbook = StreamingReader.builder()
+                    .rowCacheSize(100)
+                    .bufferSize(4096)
+                    .password(configuration.getPassword())
+                    .setAvoidTempFiles(configuration.isAvoidTempFiles())
+                    .setReadSharedFormulas(true) // NOTE: If not set to true, 
then data with shared formulas fail.
+                    .open(in);
+        } else {
+            // Providing the password to the HSSFWorkbook is done by setting a 
thread variable managed by
+            // Biff8EncryptionKey. After the workbook is created, the thread 
variable can be cleared.
+            
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());

Review Comment:
   This is problematic for multi-threaded use of ExcelReader, which is very 
possible. Under the circumstances, it seems best to avoid attempting to support 
encrypted XLS files right now. This will likely throw exceptions when 
`HSSFWorkbook` attempts to read, but that is acceptable under the 
circumstances, as it is a limitation of the library.



##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -129,4 +151,37 @@ private boolean hasExhaustedRows() {
         }
         return exhausted;
     }
+
+    private boolean isXSSFExcelFile(InputStream in, final String password) {
+        final FileMagic fileType;
+        try {
+            fileType = FileMagic.valueOf(in);
+        } catch (final IOException e) {
+            throw new ProcessException("Failed to determine file type", e);
+        }
+
+        if (fileType == FileMagic.OOXML) {
+            // Unencrypted XLSX file
+            return true;
+        }
+        if (fileType != FileMagic.OLE2) {
+            // Not an Excel file
+            return false;
+        }
+        if (password == null) {
+            // If there is no password on the OLE2 file, then it's an XLS 
file. Otherwise, the type must be further
+            // investigated.
+            return false;
+        }

Review Comment:
   It would be helpful to refactor this method to use a single return.



##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -129,4 +151,37 @@ private boolean hasExhaustedRows() {
         }
         return exhausted;
     }
+
+    private boolean isXSSFExcelFile(InputStream in, final String password) {
+        final FileMagic fileType;
+        try {
+            fileType = FileMagic.valueOf(in);
+        } catch (final IOException e) {
+            throw new ProcessException("Failed to determine file type", e);
+        }
+
+        if (fileType == FileMagic.OOXML) {
+            // Unencrypted XLSX file
+            return true;
+        }
+        if (fileType != FileMagic.OLE2) {
+            // Not an Excel file
+            return false;
+        }
+        if (password == null) {
+            // If there is no password on the OLE2 file, then it's an XLS 
file. Otherwise, the type must be further
+            // investigated.
+            return false;
+        }
+
+        DirectoryNode root;
+        try {
+            root = new POIFSFileSystem(new 
NonCloseableInputStream(in)).getRoot();
+            in.reset();

Review Comment:
   It looks like the `reset()` call should be moved to a `finally` block in 
case of exceptions.



##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -129,4 +151,37 @@ private boolean hasExhaustedRows() {
         }
         return exhausted;
     }
+
+    private boolean isXSSFExcelFile(InputStream in, final String password) {

Review Comment:
   Minor naming recommend:
   ```suggestion
       private boolean isXmlFormat(final InputStream in, final String password) 
{
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to