[
https://issues.apache.org/jira/browse/NIFI-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791146#comment-17791146
]
Philipp Korniets edited comment on NIFI-11167 at 11/29/23 3:51 PM:
-------------------------------------------------------------------
Hi [~dstiegli1]
Basically you replace real headers with column_1, column_2 - skipping real
header row. This can potentially cause a lot of issues:
Provider adds a new column between column1 and column2 and what used to be
column2 is now column3. All further logic is now based on wrong column.
I think I know where the problem is, but dont know what should be the solution:
QueryRecord - uses Use String fields from Header.
!image-2023-11-29-15-51-08-386.png!
was (Author: iiojj2):
Hi [~dstiegli1]
Basically you replace real headers with column_1, column_2 - skipping real
header row. This can potentially cause a lot of issues:
Provider adds a new column between column1 and column2 and what used to be
column2 is now column3. All further logic is now based on wrong column.
> Add Excel Record Reader
> -----------------------
>
> Key: NIFI-11167
> URL: https://issues.apache.org/jira/browse/NIFI-11167
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: David Handermann
> Assignee: Daniel Stieglitz
> Priority: Minor
> Fix For: 2.0.0-M1, 1.23.0
>
> Attachments: CSVRecordSetWriter_configuration.png,
> ExcelReaderConfiguration.png, QueryRecord_configuration.png, Test
> ExcelReader.xlsx, image-2023-11-28-18-22-07-446.png,
> image-2023-11-29-15-51-08-386.png, resulting.csv, screenshot-1.png
>
> Time Spent: 10h 10m
> Remaining Estimate: 0h
>
> A new Excel Record Reader should be implemented to support reading XSLX
> spreadsheet rows as NiFi Records. This Reader will enable integration with
> various record-oriented components, obviating the need for the narrowly
> focused ConvertExcelToCSVProcessor. The initial version of the Excel Reader
> should not support the legacy binary XLS format.
> The ExcelReader should use a library that supports reading from a stream of
> rows to avoid consuming large amounts of heap memory during processing.
> The ExcelReader should support configurable properties to read selected
> sheets. With Excel supporting typed field values, some amount of field type
> mapping will be required. Additional input filtering properties should not be
> implemented as existing Processors like QueryRecord support a wide variety of
> filtering and projection use cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)