[jira] [Commented] (NIFI-13988) ExcelReader - Use Starting Row schema strategy and string empty values

Philipp Korniets (Jira) Fri, 08 Nov 2024 07:22:25 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896701#comment-17896701
 ]


Philipp Korniets commented on NIFI-13988:
-----------------------------------------

Also I noticed that ConvertExcelToCSV uses string types everywhere
https://github.com/apache/nifi/blob/8ecf23e77c8ca828a77f3b84554ed3347d8f7fa2/nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java#L475C1-L480C14

 
{code:java}
            String stringCellValue = cell.getStringCellValue();
            final CellType type = cell.getCellType();
            if(type.equals(CellType.BOOLEAN) && formatBooleans) {
                stringCellValue = stringCellValue.equals("1") ? "TRUE" : 
"FALSE";
            }
{code}
Given that we cant control what data comes from the vendor - this is a better 
approach? or an option?

 

> ExcelReader - Use Starting Row schema strategy and string empty values
> ----------------------------------------------------------------------
>
>                 Key: NIFI-13988
>                 URL: https://issues.apache.org/jira/browse/NIFI-13988
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 2.0.0
>            Reporter: Philipp Korniets
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: Test workbook NiFi2_0.xlsx
>
>
> When Use Starting Row as schema strategy in ExcelReader it analyses first 10 
> row. Problem appears with empty cells of  *Numerical* type which can appear 
> anywhere after 10 rows. The cells *looks like* NULL, but actually is an empty 
> string.
> File with example data attached. 
> Field Exercise Price. 
> Use Starting Row throws an error:
> {code:java}
> Caused by: java.lang.NumberFormatException: For input string: ""
>     at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
>     at java.base/java.lang.Long.parseLong(Long.java:719)
>     at java.base/java.lang.Long.parseLong(Long.java:832)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.toLong(DataTypeUtils.java:1391)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:213)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:174)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.convert(ExcelRecordReader.java:170)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.lambda$getCurrentRowValues$0(ExcelRecordReader.java:127)
>     at 
> java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
>     at 
> java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:617)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.getCurrentRowValues(ExcelRecordReader.java:114)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.nextRecord(ExcelRecordReader.java:84)
>     ... 28 common frames omitted{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-13988) ExcelReader - Use Starting Row schema strategy and string empty values

Reply via email to