[ 
https://issues.apache.org/jira/browse/NIFI-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941655#comment-17941655
 ] 

Daniel Stieglitz edited comment on NIFI-14427 at 4/7/25 4:23 PM:
-----------------------------------------------------------------

{quote}Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some 
column (if yes then by which)? Or can I assume that order of rows is 
deterministic and determined by the order given by read service?
{quote}
I do not think you need ORDER BY as you can assume that the order of rows is 
deterministic and determined by the order given by read service.

 
{quote}Regarding list of columns, I have N, M numbers denoting which columns 
should be selected (e.g. for file with A, B, C header, and N = 2, M = 3, only 
columns B, C should be selected). In my use case I have files which have 
headers in the first row, which must be non-empty and not duplicated (but only 
in read range of columns, so columns outside of N - M range could be empty or 
duplicated). I need to provide list of columns in SELECT clause. I think of 2 
ways how to achieve this:
{quote}
If you are using the "Use Starting Row" strategy then perforce all fields will 
have a name the ones already there and for the empty cells they will start with 
the prefix column_ and end with the suffix of the index where they are in the 
row e.g. column_5. I am not sure of the QueryRecord behavior if there are 
duplicate column names.


was (Author: JIRAUSER294662):
{quote}Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some 
column (if yes then by which)? Or can I assume that order of rows is 
deterministic and determined by the order given by read service?
{quote}
I do not think you need ORDER BY as you can assume that the order of rows is 
deterministic and determined by the order given by read service.

 
{quote}Regarding list of columns, I have N, M numbers denoting which columns 
should be selected (e.g. for file with A, B, C header, and N = 2, M = 3, only 
columns B, C should be selected). In my use case I have files which have 
headers in the first row, which must be non-empty and not duplicated (but only 
in read range of columns, so columns outside of N - M range could be empty or 
duplicated). I need to provide list of columns in SELECT clause. I think of 2 
ways how to achieve this:
{quote}
If you are using the "Use Starting Row" strategy then perforce all fields will 
have a name the ones already there and for the empty cells they will start with 
the prefix column_ and end with the suffix of the index where they are in the 
row e.g. column_5. I am not sure of the behavior if there are duplicate column 
names.

> Add new properties for selecting rows and columns to ExcelReader processor
> --------------------------------------------------------------------------
>
>                 Key: NIFI-14427
>                 URL: https://issues.apache.org/jira/browse/NIFI-14427
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Piotr Zalas
>            Priority: Major
>
> Currently the ExcelReader processor has only the following property which 
> allows to select from which row reading of the file should start: "Starting 
> Row".
> It would be great to add the similar properties: "Ending Row", "Starting 
> Column", "Ending Column".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to