[ https://issues.apache.org/jira/browse/NIFI-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brendan Buhr updated NIFI-12510: -------------------------------- Description: As a user we process files that are not always perfect and originate from sources beyond our control and there for need certain functionality to be able to manipulate these files. With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in NiFi v2 we have a vested interest in the new Excel Record Reader being able to handle all scenarios currently catered for by the existing ConvertExcelToCSVProcessor processor. We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently have run into similar issues encountered by David Handermann. Particularly this error: {code:java} java.lang.IllegalStateException: This cell has a shared formula and it seems setReadSharedFormulas has been set to false or the formula can't be evaluated{code} We also currently process Excel files that can best be described as reports rather than data files, i.e. the files may not have a header row and may also contain multiple datasets in a single sheet. The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor helped us bring in Excel files and removed columns (sometimes blank ones) especially when a schema could not be applied due to the way data had been populated on the sheets. The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor processor also is used extensively by us when trying to retain any formatting our clients have used in the Excel files. Or optional to ignore and we could toggle as the situation permits. One other issue we have found when using the new Excel Record Reader is that when an Excel file has multiple tabs, it simply merges the output into a single flowfile, regardless of the shape of the data, where as previously when using the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with the tab name appended to the filename).I do hope that we could discuss these points in more detail. was: As a user we process files that are not always perfect and originate from sources beyond our control and there for need certain functionality to be able to manipulate these files. With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in NiFi v2 we have a vested interest in the new Excel Record Reader being able to handle all scenarios currently catered for by the existing ConvertExcelToCSVProcessor processor. We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently have run into similar issues encountered by David Handermann. Particularly this error: java.lang.IllegalStateException: This cell has a shared formula and it seems setReadSharedFormulas has been set to false or the formula can't be evaluated We also currently process Excel files that can best be described as reports rather than data files, i.e. the files may not have a header row and may also contain multiple datasets in a single sheet. The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor processor helped us bring in Excel files and removed columns (sometimes blank ones) especially when a schema could not be applied due to the way data had been populated on the sheets. The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor processor also is used extensively by us when trying to retain any formatting our clients have used in the Excel files. Or optional to ignore and we could toggle as the situation permits. One other issue we have found when using the new Excel Record Reader is that when an Excel file has multiple tabs, it simply merges the output into a single flowfile, regardless of the shape of the data, where as previously when using the ConvertExcelToCSVProcessor processor we would get a flowfile per tab (with the tab name appended to the filename).I do hope that we could discuss these points in more detail. > Excel Record Reader | Extend Functionality > ------------------------------------------ > > Key: NIFI-12510 > URL: https://issues.apache.org/jira/browse/NIFI-12510 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Brendan Buhr > Priority: Major > > As a user we process files that are not always perfect and originate from > sources beyond our control and there for need certain functionality to be > able to manipulate these files. > With the upcoming deprecation of the ConvertExcelToCSVProcessor processor in > NiFi v2 we have a vested interest in the new Excel Record Reader being able > to handle all scenarios currently catered for by the existing > ConvertExcelToCSVProcessor processor. > We have been testing the new Excel Record Reader (NiFi 1.24.0) and currently > have run into similar issues encountered by David Handermann. Particularly > this error: > {code:java} > java.lang.IllegalStateException: This cell has a shared formula and it seems > setReadSharedFormulas has been set to false or the formula can't be > evaluated{code} > We also currently process Excel files that can best be described as reports > rather than data files, i.e. the files may not have a header row and may also > contain multiple datasets in a single sheet. > The “{*}Columns To Skip{*}” option on the ConvertExcelToCSVProcessor > processor helped us bring in Excel files and removed columns (sometimes blank > ones) especially when a schema could not be applied due to the way data had > been populated on the sheets. > The “{*}Format Cell Values{*}” option on the ConvertExcelToCSVProcessor > processor also is used extensively by us when trying to retain any formatting > our clients have used in the Excel files. Or optional to ignore and we could > toggle as the situation permits. > One other issue we have found when using the new Excel Record Reader is that > when an Excel file has multiple tabs, it simply merges the output into a > single flowfile, regardless of the shape of the data, where as previously > when using the ConvertExcelToCSVProcessor processor we would get a flowfile > per tab (with the tab name appended to the filename).I do hope that we could > discuss these points in more detail. -- This message was sent by Atlassian Jira (v8.20.10#820010)