[
https://issues.apache.org/jira/browse/NIFI-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brendan Buhr updated NIFI-12510:
--------------------------------
Description:
As a user we process files that are not always perfect and originate from
sources beyond our control and there for need certain functionality to be able
to manipulate these files.
With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in
NiFi v2 we have a vested interest in the new Excel Record Reader being able to
handle all scenarios currently catered for by the existing
ConvertExcelToCSVProcessor processor.
We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently
have run into similar issues encountered by David Handermann. Particularly this
error:
{code:java}
java.lang.IllegalStateException: This cell has a shared formula and it seems
setReadSharedFormulas has been set to false or the formula can't be
evaluated{code}
We also currently process Excel files that can best be described as reports
rather than data files, i.e. the files may not have a header row and may also
contain multiple datasets in a single sheet.
The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor
helped us bring in Excel files and removed columns (sometimes blank ones)
especially when a schema could not be applied due to the way data had been
populated on the sheets.
The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor
processor also is used extensively by us when trying to retain any formatting
our clients have used in the Excel files. Or optional to ignore and we could
toggle as the situation permits.
One other issue we have found when using the new Excel Record Reader is that
when an Excel file has multiple tabs, it simply merges the output into a single
flowfile, regardless of the shape of the data, where as previously when using
the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with
the tab name appended to the filename).I do hope that we could discuss these
points in more detail.
was:
As a user we process files that are not always perfect and originate from
sources beyond our control and there for need certain functionality to be able
to manipulate these files.
With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in
NiFi v2 we have a vested interest in the new Excel Record Reader being able to
handle all scenarios currently catered for by the existing
ConvertExcelToCSVProcessor processor.
We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently
have run into similar issues encountered by David Handermann. Particularly this
error:
java.lang.IllegalStateException: This cell has a shared formula and it seems
setReadSharedFormulas has been set to false or the formula can't be evaluated
We also currently process Excel files that can best be described as reports
rather than data files, i.e. the files may not have a header row and may also
contain multiple datasets in a single sheet.
The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor
helped us bring in Excel files and removed columns (sometimes blank ones)
especially when a schema could not be applied due to the way data had been
populated on the sheets.
The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor
processor also is used extensively by us when trying to retain any formatting
our clients have used in the Excel files. Or optional to ignore and we could
toggle as the situation permits.
One other issue we have found when using the new Excel Record Reader is that
when an Excel file has multiple tabs, it simply merges the output into a single
flowfile, regardless of the shape of the data, where as previously when using
the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with
the tab name appended to the filename).I do hope that we could discuss these
points in more detail.
> Excel Record Reader | Extend Functionality
> ------------------------------------------
>
> Key: NIFI-12510
> URL: https://issues.apache.org/jira/browse/NIFI-12510
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Brendan Buhr
> Priority: Major
>
> As a user we process files that are not always perfect and originate from
> sources beyond our control and there for need certain functionality to be
> able to manipulate these files.
> With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in
> NiFi v2 we have a vested interest in the new Excel Record Reader being able
> to handle all scenarios currently catered for by the existing
> ConvertExcelToCSVProcessor processor.
> We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently
> have run into similar issues encountered by David Handermann. Particularly
> this error:
> {code:java}
> java.lang.IllegalStateException: This cell has a shared formula and it seems
> setReadSharedFormulas has been set to false or the formula can't be
> evaluated{code}
> We also currently process Excel files that can best be described as reports
> rather than data files, i.e. the files may not have a header row and may also
> contain multiple datasets in a single sheet.
> The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor
> processor helped us bring in Excel files and removed columns (sometimes blank
> ones) especially when a schema could not be applied due to the way data had
> been populated on the sheets.
> The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor
> processor also is used extensively by us when trying to retain any formatting
> our clients have used in the Excel files. Or optional to ignore and we could
> toggle as the situation permits.
> One other issue we have found when using the new Excel Record Reader is that
> when an Excel file has multiple tabs, it simply merges the output into a
> single flowfile, regardless of the shape of the data, where as previously
> when using the ConvertExcelToCSVProcessor processor we would get a flowfile
> per tab (with the tab name appended to the filename).I do hope that we could
> discuss these points in more detail.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)