[
https://issues.apache.org/jira/browse/NIFI-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927872#comment-17927872
]
Daniel Stieglitz edited comment on NIFI-14265 at 2/17/25 9:51 PM:
------------------------------------------------------------------
It looks like the difference is where the header row from the second sheet is
read as data. This skews the schema and hence the output. If you configure the
CsvRecordSetWriter to write the schema you will see the differences. Where one
sheet is read, the schema details for the date field is
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
which is a pure date hence the Date Format specified in the CsvRecordSetWriter
is used while when both sheets are read its:
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
Note the added "string" type. That extra string type I believe causes the field
to be a string and hence the Date Format specified in the CsvRecordSetWriter is
not used on it.
I would have to see if its feasible to skip header rows on both sheets when
using the "Use Starting Row" schema access strategy.
was (Author: JIRAUSER294662):
It looks like the difference is where the header row from the second sheet is
read as data. This skews the schema and hence the output. If you configure the
CsvRecordSetWriter to write the schema you will see the differences. Where one
sheet is read, the schema details for the date field is
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
while when both sheets are read its
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
note the added "string" type.
I would have to see if its feasible to skip header rows on both sheets when
using the "Use Starting Row" schema access strategy.
> ExcelReader - date stays in milliseconds
> -----------------------------------------
>
> Key: NIFI-14265
> URL: https://issues.apache.org/jira/browse/NIFI-14265
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Philipp Korniets
> Assignee: Daniel Stieglitz
> Priority: Major
> Attachments: Excel Test2.xlsx, image-2025-02-13-11-16-07-111.png,
> image-2025-02-13-11-54-45-830.png, image-2025-02-13-11-55-19-478.png,
> image-2025-02-13-11-57-55-829.png
>
>
> Hi
> observing strange behaviour of ExcelReader
> Workbook example [^Excel Test2.xlsx]
> Flow
> !image-2025-02-13-11-54-45-830.png|width=1730,height=187!
> Convert to CSV.
> ExcelReader -
> !image-2025-02-13-11-55-19-478.png|width=507,height=329!
> CSVRecordSetWriter
> !image-2025-02-13-11-57-55-829.png|width=505,height=455!
>
> if we dont have attribute ${sheetName} ExcelReader combines all sheets
> together (as expected) however the date stays in milliseconds
> {code:java}
> date,Something,Name
> 1738368000000,test1,Sheet1
> 1707696000000,test2,Sheet1
> 211248000000,aaa,Sheet2
> 540086400000,sss,Sheet2
> {code}
> If you provide sheet name - date formatted correctly
> {code:java}
> date,Something,Name
> 2025-02-01,test1,Sheet1
> 2024-02-12,test2,Sheet1{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)