zhtk commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2075613587
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +51,26 @@ class RowIterator implements Iterator<Row>, Closeable {
private Row currentRow;
RowIterator(final InputStream in, final ExcelRecordReaderConfiguration
configuration, final ComponentLog logger) {
- this.workbook = StreamingReader.builder()
- .rowCacheSize(100)
- .bufferSize(4096)
- .password(configuration.getPassword())
- .setAvoidTempFiles(configuration.isAvoidTempFiles())
- .setReadSharedFormulas(true) // NOTE: If not set to true, then
data with shared formulas fail.
- .open(in);
+ if (isXSSFExcelFile(in, configuration.getPassword())) {
+ this.workbook = StreamingReader.builder()
+ .rowCacheSize(100)
+ .bufferSize(4096)
+ .password(configuration.getPassword())
+ .setAvoidTempFiles(configuration.isAvoidTempFiles())
+ .setReadSharedFormulas(true) // NOTE: If not set to true,
then data with shared formulas fail.
+ .open(in);
+ } else {
+ // Providing the password to the HSSFWorkbook is done by setting a
thread variable managed by
+ // Biff8EncryptionKey. After the workbook is created, the thread
variable can be cleared.
+
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());
Review Comment:
@dan-s1, @exceptionfactory, I'm clear, although I'm worried that for my use
case sooner or later I'll need to support encryption. I already have users
asking if file decryption will be supported. I don't exactly get why thread
safety is the issue - I checked with debugger, that password is read and
decrypting stream is created when HSSFWorkbook instance is created. Here is
stack trace:
```
getCurrentUserPassword:52, Biff8EncryptionKey
(org.apache.poi.hssf.record.crypto)
createDecryptingStream:107, RecordFactoryInputStream$StreamEncryptionInfo
(org.apache.poi.hssf.record)
<init>:189, RecordFactoryInputStream (org.apache.poi.hssf.record)
createRecords:184, RecordFactory (org.apache.poi.hssf.record)
<init>:377, HSSFWorkbook (org.apache.poi.hssf.usermodel)
<init>:437, HSSFWorkbook (org.apache.poi.hssf.usermodel)
<init>:417, HSSFWorkbook (org.apache.poi.hssf.usermodel)
<init>:67, RowIterator (org.apache.nifi.excel)
```
So even though POI interface is skewed that it uses thread variables, maybe
it's not a problem to set thread variable and immediately create decrypted
stream using this variable? It seems like the variable won't be used later and
can be discarded. Or is it possible that thread execution can be interrupted
when HSSFWorkbook constructor executed, the original thread reassigned to
another task, and constructor execution resumed by another thread? Even if the
thread is interrupted and thread variable with password remains set on the
table, during next procedure execution it will be cleared and set to a
currently needed value.
I'm fine with adding `Input File Type` property even if the file encryption
will eventually be supported, and agree with provided argumentation.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]