exceptionfactory commented on code in PR #9874:
URL: https://github.com/apache/nifi/pull/9874#discussion_r2052863689
##########
nifi-extension-bundles/nifi-poi-bundle/nifi-poi-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##########
@@ -42,13 +51,26 @@ class RowIterator implements Iterator<Row>, Closeable {
private Row currentRow;
RowIterator(final InputStream in, final ExcelRecordReaderConfiguration
configuration, final ComponentLog logger) {
- this.workbook = StreamingReader.builder()
- .rowCacheSize(100)
- .bufferSize(4096)
- .password(configuration.getPassword())
- .setAvoidTempFiles(configuration.isAvoidTempFiles())
- .setReadSharedFormulas(true) // NOTE: If not set to true, then
data with shared formulas fail.
- .open(in);
+ if (isXSSFExcelFile(in, configuration.getPassword())) {
+ this.workbook = StreamingReader.builder()
+ .rowCacheSize(100)
+ .bufferSize(4096)
+ .password(configuration.getPassword())
+ .setAvoidTempFiles(configuration.isAvoidTempFiles())
+ .setReadSharedFormulas(true) // NOTE: If not set to true,
then data with shared formulas fail.
+ .open(in);
+ } else {
+ // Providing the password to the HSSFWorkbook is done by setting a
thread variable managed by
+ // Biff8EncryptionKey. After the workbook is created, the thread
variable can be cleared.
+
Biff8EncryptionKey.setCurrentUserPassword(configuration.getPassword());
Review Comment:
Although making the Input File Type a configurable property is an option,
there are positives and negatives to consider requiring the configuration. On
the one hand, IdentifyMimeType could be used with a RouteOnAttribute Processor
to determine the appropriate routing. On the other hand, that increases the
potential flow design complexity when attempting to support both XLS and XLSX.
I agree that introducing a new configuration property would simplify this
logic and remove the need for File Type detection.
Given the mismatch in functionality, with and without support for
password-protection, there does seem to be some value in requiring one type or
the other.
It is also worth noting that XLS has not been supported previously, and that
it has different memory usage characteristics as previously discussed.
With that in mind, I would be open to adding a new property named `Input
File Type` with `XLSX` and `XLS` values, where `XLSX` would be the default
type. The Property could support Expression Language, which is resolvable from
the Map of variables passed to `createRecordReader()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]