[ 
https://issues.apache.org/jira/browse/NIFI-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brendan Buhr updated NIFI-12510:
--------------------------------
    Description: 
As a user we process files that are not always perfect and originate from 
sources beyond our control and there for need certain functionality to be able 
to manipulate these files.

With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in 
NiFi v2 we have a vested interest in the new Excel Record Reader being able to 
handle all scenarios currently catered for by the existing 
ConvertExcelToCSVProcessor processor.

We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently 
have run into similar issues encountered by David Handermann. Particularly this 
error:
{code:java}
java.lang.IllegalStateException: This cell has a shared formula and it seems 
setReadSharedFormulas has been set to false or the formula can't be 
evaluated{code}
We also currently process Excel files that can best be described as reports 
rather than data files, i.e. the files may not have a header row and may also 
contain multiple datasets in a single sheet.

The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor 
helped us bring in Excel files and removed columns (sometimes blank ones) 
especially when a schema could not be applied due to the way data had been 
populated on the sheets.

The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor 
processor also is used extensively by us when trying to retain any formatting 
our clients have used in the Excel files. Or optional to ignore and we could 
toggle as the situation permits.

One other issue we have found when using the new Excel Record Reader is that 
when an Excel file has multiple tabs, it simply merges the output into a single 
flowfile, regardless of the shape of the data, where as previously when using 
the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with 
the tab name appended to the filename).I do hope that we could discuss these 
points in more detail.

  was:
As a user we process files that are not always perfect and originate from 
sources beyond our control and there for need certain functionality to be able 
to manipulate these files.

With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in 
NiFi v2 we have a vested interest in the new Excel Record Reader being able to 
handle all scenarios currently catered for by the existing 
ConvertExcelToCSVProcessor processor.

We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently 
have run into similar issues encountered by David Handermann. Particularly this 
error:
java.lang.IllegalStateException: This cell has a shared formula and it seems 
setReadSharedFormulas has been set to false or the formula can't be evaluated
We also currently process Excel files that can best be described as reports 
rather than data files, i.e. the files may not have a header row and may also 
contain multiple datasets in a single sheet.

The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor 
helped us bring in Excel files and removed columns (sometimes blank ones) 
especially when a schema could not be applied due to the way data had been 
populated on the sheets.

The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor 
processor also is used extensively by us when trying to retain any formatting 
our clients have used in the Excel files. Or optional to ignore and we could 
toggle as the situation permits.

One other issue we have found when using the new Excel Record Reader is that 
when an Excel file has multiple tabs, it simply merges the output into a single 
flowfile, regardless of the shape of the data, where as previously when using 
the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with 
the tab name appended to the filename).I do hope that we could discuss these 
points in more detail.


> Excel Record Reader | Extend Functionality
> ------------------------------------------
>
>                 Key: NIFI-12510
>                 URL: https://issues.apache.org/jira/browse/NIFI-12510
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Brendan Buhr
>            Priority: Major
>
> As a user we process files that are not always perfect and originate from 
> sources beyond our control and there for need certain functionality to be 
> able to manipulate these files.
> With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in 
> NiFi v2 we have a vested interest in the new Excel Record Reader being able 
> to handle all scenarios currently catered for by the existing 
> ConvertExcelToCSVProcessor processor.
> We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently 
> have run into similar issues encountered by David Handermann. Particularly 
> this error:
> {code:java}
> java.lang.IllegalStateException: This cell has a shared formula and it seems 
> setReadSharedFormulas has been set to false or the formula can't be 
> evaluated{code}
> We also currently process Excel files that can best be described as reports 
> rather than data files, i.e. the files may not have a header row and may also 
> contain multiple datasets in a single sheet.
> The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor 
> processor helped us bring in Excel files and removed columns (sometimes blank 
> ones) especially when a schema could not be applied due to the way data had 
> been populated on the sheets.
> The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor 
> processor also is used extensively by us when trying to retain any formatting 
> our clients have used in the Excel files. Or optional to ignore and we could 
> toggle as the situation permits.
> One other issue we have found when using the new Excel Record Reader is that 
> when an Excel file has multiple tabs, it simply merges the output into a 
> single flowfile, regardless of the shape of the data, where as previously 
> when using the ConvertExcelToCSVProcessor processor we would get a flowfile 
> per tab (with the tab name appended to the filename).I do hope that we could 
> discuss these points in more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to