[jira] [Comment Edited] (NIFI-11167) Add Excel Record Reader

Philipp Korniets (Jira) Wed, 29 Nov 2023 09:35:12 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791207#comment-17791207
 ]


Philipp Korniets edited comment on NIFI-11167 at 11/29/23 5:34 PM:
-------------------------------------------------------------------

i might be wrong, but your suggestion has its limitations
i.e. lets assume i have a schema for the above file

{code:java}
{
        "type": "record",
        "name": "test",
        "fields":
        [
    {"name":"Descr","type":["string","null"]},
    {"name":"empty","type":["string","null"]},
    {"name":"Events","type":["string","null"]},
    {"name":"CostPerEvent","type":["string","null"]},
    {"name":"Amount","type":["double","null"]}
]
}
{code}

and my *select statement will be "select sum(Amount) from flowfile *- all works 
well.
next - file provider add column before the Amount - column called *MonthlyCost*.
my schema will still work, but the calculation will be based on a completely 
different column/values. In the scenario where CSVReader has *Use String Fields 
From Header* this will not happen.

We work with a lot of data providers and this happens a lot - change the order 
of the fields. So to avoid any miscalcs we rely on Field Names as they come 
from the file rather than using schemas.


was (Author: iiojj2):
i might be wrong, but your suggestion has its limitations
i.e. lets assume i have a schema for the above file

{code:java}
{
        "type": "record",
        "name": "test",
        "fields":
        [
    {"name":"Descr","type":["string","null"]},
    {"name":"empty","type":["string","null"]},
    {"name":"Events","type":["string","null"]},
    {"name":"CostPerEvent","type":["string","null"]},
    {"name":"Amount","type":["double","null"]}
]
}
{code}

and my *select statement will be "select sum(Amount) from flowfile *- all works 
well.
next - file provider add column before the Amount - column called *MonthlyCost*.
my schema will still work, but the calculation will be based on a completely 
different column/values. In the scenario where CSVReader has *Use String Fields 
From Header* this will not happen.

> Add Excel Record Reader
> -----------------------
>
>                 Key: NIFI-11167
>                 URL: https://issues.apache.org/jira/browse/NIFI-11167
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: David Handermann
>            Assignee: Daniel Stieglitz
>            Priority: Minor
>             Fix For: 2.0.0-M1, 1.23.0
>
>         Attachments: CSVRecordSetWriter_configuration.png, 
> ExcelReaderConfiguration.png, QueryRecord_configuration.png, Test 
> ExcelReader.xlsx, image-2023-11-28-18-22-07-446.png, 
> image-2023-11-29-15-51-08-386.png, resulting.csv, screenshot-1.png
>
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> A new Excel Record Reader should be implemented to support reading XSLX 
> spreadsheet rows as NiFi Records. This Reader will enable integration with 
> various record-oriented components, obviating the need for the narrowly 
> focused ConvertExcelToCSVProcessor. The initial version of the Excel Reader 
> should not support the legacy binary XLS format.
> The ExcelReader should use a library that supports reading from a stream of 
> rows to avoid consuming large amounts of heap memory during processing.
> The ExcelReader should support configurable properties to read selected 
> sheets. With Excel supporting typed field values, some amount of field type 
> mapping will be required. Additional input filtering properties should not be 
> implemented as existing Processors like QueryRecord support a wide variety of 
> filtering and projection use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (NIFI-11167) Add Excel Record Reader

Reply via email to