zhtk commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2075613587


##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +51,26 @@ class RowIterator implements Iterator<Row>, Closeable {
     private Row currentRow;
 
     RowIterator(final InputStream in, final ExcelRecordReaderConfiguration 
configuration, final ComponentLog logger) {
-        this.workbook = StreamingReader.builder()
-                .rowCacheSize(100)
-                .bufferSize(4096)
-                .password(configuration.getPassword())
-                .setAvoidTempFiles(configuration.isAvoidTempFiles())
-                .setReadSharedFormulas(true) // NOTE: If not set to true, then 
data with shared formulas fail.
-                .open(in);
+        if (isXSSFExcelFile(in, configuration.getPassword())) {
+            this.workbook = StreamingReader.builder()
+                    .rowCacheSize(100)
+                    .bufferSize(4096)
+                    .password(configuration.getPassword())
+                    .setAvoidTempFiles(configuration.isAvoidTempFiles())
+                    .setReadSharedFormulas(true) // NOTE: If not set to true, 
then data with shared formulas fail.
+                    .open(in);
+        } else {
+            // Providing the password to the HSSFWorkbook is done by setting a 
thread variable managed by
+            // Biff8EncryptionKey. After the workbook is created, the thread 
variable can be cleared.
+            
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());

Review Comment:
   @dan-s1, @exceptionfactory, I'm clear, although I'm worried that for my use 
case sooner or later I'll need to support encryption. I already have users 
asking if file decryption will be supported. I don't exactly get why thread 
safety is the issue - I checked with debugger, that password is read and 
decrypting stream is created when HSSFWorkbook instance is created. Here is 
stack trace:
   
   ```
   getCurrentUserPassword:52, Biff8EncryptionKey 
(org.apache.poi.hssf.record.crypto)
   createDecryptingStream:107, RecordFactoryInputStream$StreamEncryptionInfo 
(org.apache.poi.hssf.record)
   <init>:189, RecordFactoryInputStream (org.apache.poi.hssf.record)
   createRecords:184, RecordFactory (org.apache.poi.hssf.record)
   <init>:377, HSSFWorkbook (org.apache.poi.hssf.usermodel)
   <init>:437, HSSFWorkbook (org.apache.poi.hssf.usermodel)
   <init>:417, HSSFWorkbook (org.apache.poi.hssf.usermodel)
   <init>:67, RowIterator (org.apache.nifi.excel)
   ```
   
   So even though POI interface is skewed that it uses thread variables, maybe 
it's not a problem to set thread variable and immediately create decrypted 
stream using this variable? It seems like the variable won't be used later and 
can be discarded. Or is it possible that thread execution can be interrupted 
when HSSFWorkbook constructor executed, the original thread reassigned to 
another task, and constructor execution resumed by another thread? Even if the 
thread is interrupted and thread variable with password remains set on the 
table, during next procedure execution it will be cleared and set to a 
currently needed value.
   
   I'm fine with adding `Input File Type` property even if the file encryption 
will eventually be supported, and agree with provided argumentation.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to