[jira] [Commented] (NIFI-13988) ExcelReader - Use Starting Row schema strategy and string empty values

Daniel Stieglitz (Jira) Fri, 08 Nov 2024 07:11:32 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896700#comment-17896700
 ]


Daniel Stieglitz commented on NIFI-13988:
-----------------------------------------

[~exceptionfactory] Maybe you remember as I do not why 10 was chosen to be the 
default amount of rows to infer from when using the starting row strategy? I 
seem to remember it was enough to make an assumption that all the rows were 
going be the same but I have looked at NIFI-12491 and the associated PR but I 
do not see that. Would it be okay to refactor the starting row strategy to be 
similar to the Infer Schema strategy and read all the rows to infer from except 
the first row?

> ExcelReader - Use Starting Row schema strategy and string empty values
> ----------------------------------------------------------------------
>
>                 Key: NIFI-13988
>                 URL: https://issues.apache.org/jira/browse/NIFI-13988
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 2.0.0
>            Reporter: Philipp Korniets
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: Test workbook NiFi2_0.xlsx
>
>
> When Use Starting Row as schema strategy in ExcelReader it analyses first 10 
> row. Problem appears with empty cells of  *Numerical* type which can appear 
> anywhere after 10 rows. The cells *looks like* NULL, but actually is an empty 
> string.
> File with example data attached. 
> Field Exercise Price. 
> Use Starting Row throws an error:
> {code:java}
> Caused by: java.lang.NumberFormatException: For input string: ""
>     at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
>     at java.base/java.lang.Long.parseLong(Long.java:719)
>     at java.base/java.lang.Long.parseLong(Long.java:832)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.toLong(DataTypeUtils.java:1391)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:213)
>     at 
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertType(DataTypeUtils.java:174)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.convert(ExcelRecordReader.java:170)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.lambda$getCurrentRowValues$0(ExcelRecordReader.java:127)
>     at 
> java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
>     at 
> java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:617)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.getCurrentRowValues(ExcelRecordReader.java:114)
>     at 
> org.apache.nifi.excel.ExcelRecordReader.nextRecord(ExcelRecordReader.java:84)
>     ... 28 common frames omitted{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-13988) ExcelReader - Use Starting Row schema strategy and string empty values

Reply via email to