Brendan Buhr created NIFI-12510:
-----------------------------------
Summary: Excel Record Reader | Extend Functionality
Key: NIFI-12510
URL: https://issues.apache.org/jira/browse/NIFI-12510
Project: Apache NiFi
Issue Type: Improvement
Reporter: Brendan Buhr
As a user we process files that are not always perfect and originate from
sources beyond our control and there for need certain functionality to be able
to manipulate these files.
With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in
NiFi v2 we have a vested interest in the new Excel Record Reader being able to
handle all scenarios currently catered for by the existing
ConvertExcelToCSVProcessor processor.
We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently
have run into similar issues encountered by David Handermann. Particularly this
error:
java.lang.IllegalStateException: This cell has a shared formula and it seems
setReadSharedFormulas has been set to false or the formula can't be evaluated
We also currently process Excel files that can best be described as reports
rather than data files, i.e. the files may not have a header row and may also
contain multiple datasets in a single sheet.
The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor
helped us bring in Excel files and removed columns (sometimes blank ones)
especially when a schema could not be applied due to the way data had been
populated on the sheets.
The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor
processor also is used extensively by us when trying to retain any formatting
our clients have used in the Excel files. Or optional to ignore and we could
toggle as the situation permits.
One other issue we have found when using the new Excel Record Reader is that
when an Excel file has multiple tabs, it simply merges the output into a single
flowfile, regardless of the shape of the data, where as previously when using
the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with
the tab name appended to the filename).I do hope that we could discuss these
points in more detail.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)