[jira] [Comment Edited] (NIFI-14427) Add new properties for selecting rows and columns to ExcelReader processor

Daniel Stieglitz (Jira) Mon, 07 Apr 2025 09:36:50 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941655#comment-17941655
 ]


Daniel Stieglitz edited comment on NIFI-14427 at 4/7/25 4:33 PM:
-----------------------------------------------------------------

{quote}Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some 
column (if yes then by which)? Or can I assume that order of rows is 
deterministic and determined by the order given by read service?
{quote}
I do not think you need ORDER BY as you can assume that the order of rows is 
deterministic and determined by the order given by read service.

 
{quote}Regarding list of columns, I have N, M numbers denoting which columns 
should be selected (e.g. for file with A, B, C header, and N = 2, M = 3, only 
columns B, C should be selected). In my use case I have files which have 
headers in the first row, which must be non-empty and not duplicated (but only 
in read range of columns, so columns outside of N - M range could be empty or 
duplicated). I need to provide list of columns in SELECT clause. I think of 2 
ways how to achieve this:
{quote}
If you are using the "Use Starting Row" strategy then perforce all fields will 
have a name the ones already there and for the empty cells they will start with 
the prefix column_ and end with the suffix of the index where they are in the 
row e.g. column_5. I am not sure of the QueryRecord behavior if there are 
duplicate column names.

 
{quote}I need to provide list of columns in SELECT clause.
{quote}
I think you need to create properties whose key is the name of the relationship 
and whose value is your select statement. Hence you need to have the column 
names known beforehand.


was (Author: JIRAUSER294662):
{quote}Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some 
column (if yes then by which)? Or can I assume that order of rows is 
deterministic and determined by the order given by read service?
{quote}
I do not think you need ORDER BY as you can assume that the order of rows is 
deterministic and determined by the order given by read service.

 
{quote}Regarding list of columns, I have N, M numbers denoting which columns 
should be selected (e.g. for file with A, B, C header, and N = 2, M = 3, only 
columns B, C should be selected). In my use case I have files which have 
headers in the first row, which must be non-empty and not duplicated (but only 
in read range of columns, so columns outside of N - M range could be empty or 
duplicated). I need to provide list of columns in SELECT clause. I think of 2 
ways how to achieve this:
{quote}
If you are using the "Use Starting Row" strategy then perforce all fields will 
have a name the ones already there and for the empty cells they will start with 
the prefix column_ and end with the suffix of the index where they are in the 
row e.g. column_5. I am not sure of the QueryRecord behavior if there are 
duplicate column names.

> Add new properties for selecting rows and columns to ExcelReader processor
> --------------------------------------------------------------------------
>
>                 Key: NIFI-14427
>                 URL: https://issues.apache.org/jira/browse/NIFI-14427
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Piotr Zalas
>            Priority: Major
>
> Currently the ExcelReader processor has only the following property which 
> allows to select from which row reading of the file should start: "Starting 
> Row".
> It would be great to add the similar properties: "Ending Row", "Starting 
> Column", "Ending Column".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (NIFI-14427) Add new properties for selecting rows and columns to ExcelReader processor

Reply via email to