[jira] [Commented] (NIFI-13988) ExcelReader - Use Starting Row schema strategy and string empty values

David Handermann (Jira) Fri, 08 Nov 2024 07:50:06 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896712#comment-17896712
 ]


David Handermann commented on NIFI-13988:
-----------------------------------------

[~dstiegli1] The main reason for limiting the number of rows is for efficiency, 
avoiding the need to read the file twice.

That's a good point [~iiojj2], given the uncertainty, a more flexible solution 
could be to always have {{STRING}} as one of several type choices.

Alternatively, based on the stack trace, it seems like number handling should 
account for an empty string and treat it as a {{null}} when the inferred type 
is a number.

> ExcelReader - Use Starting Row schema strategy and string empty values
> ----------------------------------------------------------------------
>
>                 Key: NIFI-13988
>                 URL: https://issues.apache.org/jira/browse/NIFI-13988
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 2.0.0
>            Reporter: Philipp Korniets
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: Test workbook NiFi2_0.xlsx
>
>
> When Use Starting Row as schema strategy in ExcelReader it analyses first 10 
> row. Problem appears with empty cells of  *Numerical* type which can appear 
> anywhere after 10 rows. The cells *looks like* NULL, but actually is an empty 
> string.
> File with example data attached. 
> Field Exercise Price. 
> Use Starting Row throws an error:
> {code:java}
> Caused by: java.lang.NumberFormatException: For input string: ""
>     at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
>     at java.base/java.lang.Long.parseLong(Long.java:719)
>     at java.base/java.lang.Long.parseLong(Long.java:832)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.toLong(DataTypeUtils.java:1391)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:213)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:174)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.convert(ExcelRecordReader.java:170)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.lambda$getCurrentRowValues$0(ExcelRecordReader.java:127)
>     at 
> java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
>     at 
> java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:617)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.getCurrentRowValues(ExcelRecordReader.java:114)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.nextRecord(ExcelRecordReader.java:84)
>     ... 28 common frames omitted{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-13988) ExcelReader - Use Starting Row schema strategy and string empty values

Reply via email to