[jira] [Commented] (NIFI-14427) Add new properties for selecting rows and columns to ExcelReader processor

Piotr Zalas (Jira) Mon, 07 Apr 2025 07:12:06 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941605#comment-17941605
 ]


Piotr Zalas commented on NIFI-14427:
------------------------------------

I'm not sure, could you help me understand how it works?

Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some column 
(if yes then by which)? Or can I assume that order of rows is deterministic and 
determined by the order given by read service?

Regarding list of columns, I have N, M numbers denoting which columns should be 
selected (e.g. for file with A, B, C header, and N = 2, M = 3, only columns B, 
C should be selected). In my use case I have files which have headers in the 
first row, which must be non-empty and not duplicated (but only in read range 
of columns, so columns outside of N - M range could be empty or duplicated). I 
need to provide list of columns in SELECT clause. I think of 2 ways how to 
achieve this:
 # Extract schema of file using ExtractRecordSchema (skipping to starting row 
and reading first row as header), write it to avro.schema attribute and then 
use this attribute in QueryRecord to select columns. Would it work if two 
columns in file have empty value in header or if some special characters (e.g. 
space is used in schema name)?
 # In read service for QueryRecord, don't read the header. Column names would 
be then autogenerated as column_1, column_2, ... If the expression language 
supported generating numbers from N to M, I could map these numbers to specific 
column names. But then I still would need to rename these columns to values 
from the header. Not sure how to achieve this.

> Add new properties for selecting rows and columns to ExcelReader processor
> --------------------------------------------------------------------------
>
>                 Key: NIFI-14427
>                 URL: https://issues.apache.org/jira/browse/NIFI-14427
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Piotr Zalas
>            Priority: Major
>
> Currently the ExcelReader processor has only the following property which 
> allows to select from which row reading of the file should start: "Starting 
> Row".
> It would be great to add the similar properties: "Ending Row", "Starting 
> Column", "Ending Column".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-14427) Add new properties for selecting rows and columns to ExcelReader processor

Reply via email to