[ 
https://issues.apache.org/jira/browse/NIFI-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003964#comment-18003964
 ] 

ASF subversion and git services commented on NIFI-14579:
--------------------------------------------------------

Commit 3abb62020d90de859c3ed6d08e892a34ae5d6c59 in nifi's branch 
refs/heads/main from dan-s1
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3abb62020d ]

NIFI-14579 Added Row Evaluation Strategy property for Excel Schema Inference 
(#10000)

- Aligned the Use Starting Row strategy to use a similar approach as the Infer 
Schema strategy
- Added a Row Evaluation Strategy property for using the starting row as the 
schema field names, with either Standard or All Rows options

Signed-off-by: David Handermann <[email protected]>

> Add parameter to configure number of rows used in schema inference with 
> header in ExcelReader service
> -----------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-14579
>                 URL: https://issues.apache.org/jira/browse/NIFI-14579
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Piotr Zalas
>            Assignee: Daniel Stieglitz
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently ExcelReader service allows to configure *Schema Access Strategy* 
> parameter as {*}Use Starting Row{*}. With this parameter, only 10 first rows 
> are used to infer schema of columns in the sheet, as opposed to *Infer 
> Schema* strategy.
> My user requests that all rows in the sheet are used to infer schema. 
> Moreover, the service is used in QueryRecord processor to limit number of 
> rows read (as described in NIFI-14427). The user wants to infer schema only 
> from rows they read, not all rows in the sheet.
> It would be great to add parameter to ExcelReader that allows to configure 
> number of rows read during schema inference (i.e. 
> {{ExcelHeaderSchemaStrategy#NUM_ROWS_TO_DETERMINE_TYPES}} variable). The 
> parameter could probably show conditionally based on value of {*}Schema 
> Access Strategy{*}. Value 0 could have special meaning that all rows in the 
> sheet should be read. The default value could be 10 to preserve existing 
> behavior. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to