[
https://issues.apache.org/jira/browse/NIFI-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927872#comment-17927872
]
Daniel Stieglitz edited comment on NIFI-14265 at 2/17/25 9:46 PM:
------------------------------------------------------------------
It looks like the difference is where the header row from the second sheet is
read as data. This skews the schema and hence the output. If you configure the
CsvRecordSetWriter to write the schema you will see the differences. Where one
sheet is read, the schema details for the date field is
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
while when both sheets are read its
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
note the added "string" type.
I would have to see if its feasible to skip header rows on both sheets when
using the "Use Starting Row" schema access strategy.
was (Author: JIRAUSER294662):
It looks like the difference is where the header row from the second sheet is
read as data. This skews the schema and hence the output. If you configure the
CsvRecordSetWriter to write the schema you will see the differences. Where one
sheet is read the schema details for the date field is
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
while when both sheets are read its
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
note the added "string" type.
I would have to see if its feasible to indicate header rows on both sheets to
skip them when using the "Use Starting Row" schema access strategy.
> ExcelReader - date stays in milliseconds
> -----------------------------------------
>
> Key: NIFI-14265
> URL: https://issues.apache.org/jira/browse/NIFI-14265
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Philipp Korniets
> Assignee: Daniel Stieglitz
> Priority: Major
> Attachments: Excel Test2.xlsx, image-2025-02-13-11-16-07-111.png,
> image-2025-02-13-11-54-45-830.png, image-2025-02-13-11-55-19-478.png,
> image-2025-02-13-11-57-55-829.png
>
>
> Hi
> observing strange behaviour of ExcelReader
> Workbook example [^Excel Test2.xlsx]
> Flow
> !image-2025-02-13-11-54-45-830.png|width=1730,height=187!
> Convert to CSV.
> ExcelReader -
> !image-2025-02-13-11-55-19-478.png|width=507,height=329!
> CSVRecordSetWriter
> !image-2025-02-13-11-57-55-829.png|width=505,height=455!
>
> if we dont have attribute ${sheetName} ExcelReader combines all sheets
> together (as expected) however the date stays in milliseconds
> {code:java}
> date,Something,Name
> 1738368000000,test1,Sheet1
> 1707696000000,test2,Sheet1
> 211248000000,aaa,Sheet2
> 540086400000,sss,Sheet2
> {code}
> If you provide sheet name - date formatted correctly
> {code:java}
> date,Something,Name
> 2025-02-01,test1,Sheet1
> 2024-02-12,test2,Sheet1{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)