[
https://issues.apache.org/jira/browse/NIFI-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941605#comment-17941605
]
Piotr Zalas commented on NIFI-14427:
------------------------------------
I'm not sure, could you help me understand how it works?
Using LIMIT clause in QueryRecord, do I need to ORDER BY rows by some column
(if yes then by which)? Or can I assume that order of rows is deterministic and
determined by the order given by read service?
Regarding list of columns, I have N, M numbers denoting which columns should be
selected (e.g. for file with A, B, C header, and N = 2, M = 3, only columns B,
C should be selected). In my use case I have files which have headers in the
first row, which must be non-empty and not duplicated (but only in read range
of columns, so columns outside of N - M range could be empty or duplicated). I
need to provide list of columns in SELECT clause. I think of 2 ways how to
achieve this:
# Extract schema of file using ExtractRecordSchema (skipping to starting row
and reading first row as header), write it to avro.schema attribute and then
use this attribute in QueryRecord to select columns. Would it work if two
columns in file have empty value in header or if some special characters (e.g.
space is used in schema name)?
# In read service for QueryRecord, don't read the header. Column names would
be then autogenerated as column_1, column_2, ... If the expression language
supported generating numbers from N to M, I could map these numbers to specific
column names. But then I still would need to rename these columns to values
from the header. Not sure how to achieve this.
> Add new properties for selecting rows and columns to ExcelReader processor
> --------------------------------------------------------------------------
>
> Key: NIFI-14427
> URL: https://issues.apache.org/jira/browse/NIFI-14427
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Piotr Zalas
> Priority: Major
>
> Currently the ExcelReader processor has only the following property which
> allows to select from which row reading of the file should start: "Starting
> Row".
> It would be great to add the similar properties: "Ending Row", "Starting
> Column", "Ending Column".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)