exceptionfactory commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2052863689


##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +51,26 @@ class RowIterator implements Iterator<Row>, Closeable {
     private Row currentRow;
 
     RowIterator(final InputStream in, final ExcelRecordReaderConfiguration 
configuration, final ComponentLog logger) {
-        this.workbook = StreamingReader.builder()
-                .rowCacheSize(100)
-                .bufferSize(4096)
-                .password(configuration.getPassword())
-                .setAvoidTempFiles(configuration.isAvoidTempFiles())
-                .setReadSharedFormulas(true) // NOTE: If not set to true, then 
data with shared formulas fail.
-                .open(in);
+        if (isXSSFExcelFile(in, configuration.getPassword())) {
+            this.workbook = StreamingReader.builder()
+                    .rowCacheSize(100)
+                    .bufferSize(4096)
+                    .password(configuration.getPassword())
+                    .setAvoidTempFiles(configuration.isAvoidTempFiles())
+                    .setReadSharedFormulas(true) // NOTE: If not set to true, 
then data with shared formulas fail.
+                    .open(in);
+        } else {
+            // Providing the password to the HSSFWorkbook is done by setting a 
thread variable managed by
+            // Biff8EncryptionKey. After the workbook is created, the thread 
variable can be cleared.
+            
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());

Review Comment:
   Although making the Input File Type a configurable property is an option, 
there are positives and negatives to consider requiring the configuration. On 
the one hand, IdentifyMimeType could be used with a RouteOnAttribute Processor 
to determine the appropriate routing. On the other hand, that increases the 
potential flow design complexity when attempting to support both XLS and XLSX.
   
   I agree that introducing a new configuration property would simplify this 
logic and remove the need for File Type detection.
   
   Given the mismatch in functionality, with and without support for 
password-protection, there does seem to be some value in requiring one type or 
the other.
   
   It is also worth noting that XLS has not been supported previously, and that 
it has different memory usage characteristics as previously discussed.
   
   With that in mind, I would be open to adding a new property named `Input 
File Type` with `XLSX` and `XLS` values, where `XLSX` would be the default 
type. The Property could support Expression Language, which is resolvable from 
the Map of variables passed to `createRecordReader()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to