[jira] [Commented] (NIFI-14579) Add parameter to configure number of rows used in schema inference with header in ExcelReader service

David Handermann (Jira) Mon, 19 May 2025 08:52:05 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952662#comment-17952662
 ]


David Handermann commented on NIFI-14579:
-----------------------------------------

[~zhtk] On initial read, making the precise number of rows configurable does 
not seem like the best approach. The general nature of Infer Schema expects a 
varied number of rows from input files, so expecting a flow designer to define 
a specific number of rows could be confusing and difficult to maintain.

Instead of supporting a configurable number of rows, a more general strategy 
property seems like a better option. One option could be a more generalized 
sizing, such as "Standard" which would retain the current default value of 10. 
Another option could be "All", indicating all rows should be read.

Something between those two options raises questions about the data format 
itself. Is the idea that the maximum value would be specified as a FlowFile 
attribute?

> Add parameter to configure number of rows used in schema inference with 
> header in ExcelReader service
> -----------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-14579
>                 URL: https://issues.apache.org/jira/browse/NIFI-14579
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Piotr Zalas
>            Priority: Major
>
> Currently ExcelReader service allows to configure *Schema Access Strategy* 
> parameter as {*}Use Starting Row{*}. With this parameter, only 10 first rows 
> are used to infer schema of columns in the sheet, as opposed to *Infer 
> Schema* strategy.
> My user requests that all rows in the sheet are used to infer schema. 
> Moreover, the service is used in QueryRecord processor to limit number of 
> rows read (as described in NIFI-14427). The user wants to infer schema only 
> from rows they read, not all rows in the sheet.
> It would be great to add parameter to ExcelReader that allows to configure 
> number of rows read during schema inference (i.e. 
> {{ExcelHeaderSchemaStrategy#NUM_ROWS_TO_DETERMINE_TYPES}} variable). The 
> parameter could probably show conditionally based on value of {*}Schema 
> Access Strategy{*}. Value 0 could have special meaning that all rows in the 
> sheet should be read. The default value could be 10 to preserve existing 
> behavior. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-14579) Add parameter to configure number of rows used in schema inference with header in ExcelReader service

Reply via email to