[
https://issues.apache.org/jira/browse/NIFI-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941655#comment-17941655
]
Daniel Stieglitz edited comment on NIFI-14427 at 4/7/25 4:33 PM:
-----------------------------------------------------------------
{quote}Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some
column (if yes then by which)? Or can I assume that order of rows is
deterministic and determined by the order given by read service?
{quote}
I do not think you need ORDER BY as you can assume that the order of rows is
deterministic and determined by the order given by read service.
{quote}Regarding list of columns, I have N, M numbers denoting which columns
should be selected (e.g. for file with A, B, C header, and N = 2, M = 3, only
columns B, C should be selected). In my use case I have files which have
headers in the first row, which must be non-empty and not duplicated (but only
in read range of columns, so columns outside of N - M range could be empty or
duplicated). I need to provide list of columns in SELECT clause. I think of 2
ways how to achieve this:
{quote}
If you are using the "Use Starting Row" strategy then perforce all fields will
have a name the ones already there and for the empty cells they will start with
the prefix column_ and end with the suffix of the index where they are in the
row e.g. column_5. I am not sure of the QueryRecord behavior if there are
duplicate column names.
{quote}I need to provide list of columns in SELECT clause.
{quote}
I think you need to create properties whose key is the name of the relationship
and whose value is your select statement. Hence you need to have the column
names known beforehand.
was (Author: JIRAUSER294662):
{quote}Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some
column (if yes then by which)? Or can I assume that order of rows is
deterministic and determined by the order given by read service?
{quote}
I do not think you need ORDER BY as you can assume that the order of rows is
deterministic and determined by the order given by read service.
{quote}Regarding list of columns, I have N, M numbers denoting which columns
should be selected (e.g. for file with A, B, C header, and N = 2, M = 3, only
columns B, C should be selected). In my use case I have files which have
headers in the first row, which must be non-empty and not duplicated (but only
in read range of columns, so columns outside of N - M range could be empty or
duplicated). I need to provide list of columns in SELECT clause. I think of 2
ways how to achieve this:
{quote}
If you are using the "Use Starting Row" strategy then perforce all fields will
have a name the ones already there and for the empty cells they will start with
the prefix column_ and end with the suffix of the index where they are in the
row e.g. column_5. I am not sure of the QueryRecord behavior if there are
duplicate column names.
> Add new properties for selecting rows and columns to ExcelReader processor
> --------------------------------------------------------------------------
>
> Key: NIFI-14427
> URL: https://issues.apache.org/jira/browse/NIFI-14427
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Piotr Zalas
> Priority: Major
>
> Currently the ExcelReader processor has only the following property which
> allows to select from which row reading of the file should start: "Starting
> Row".
> It would be great to add the similar properties: "Ending Row", "Starting
> Column", "Ending Column".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)