[
https://issues.apache.org/jira/browse/NIFI-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929236#comment-17929236
]
Daniel Stieglitz commented on NIFI-14265:
-----------------------------------------
[~exceptionfactory] I believe I understand why the bug being reported here is
happening. The ExcelHeaderSchemaStrategy takes into account the "first row" in
subsequent sheets when inferring the schema while the ExcelRecordReader does
not take into account the "first row" in subsequent sheets. Hence in the
reported example when both sheets are read, the schema considers the date
column to be date or a string while the actual record data does not have the
"first row" from the second sheet. I just want to confirm that the expected
behavior for ExcelReader is to consider the configured first row to apply
across all sheets in a workbook. Do we want to consider the ability to
configure different starting rows depending on the sheet? Understandably this
would add complexity. Or do we just want to document that the first row would
apply across multiple sheets?
> ExcelReader - date stays in milliseconds
> -----------------------------------------
>
> Key: NIFI-14265
> URL: https://issues.apache.org/jira/browse/NIFI-14265
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Philipp Korniets
> Assignee: Daniel Stieglitz
> Priority: Major
> Attachments: Excel Test2.xlsx, image-2025-02-13-11-16-07-111.png,
> image-2025-02-13-11-54-45-830.png, image-2025-02-13-11-55-19-478.png,
> image-2025-02-13-11-57-55-829.png
>
>
> Hi
> observing strange behaviour of ExcelReader
> Workbook example [^Excel Test2.xlsx]
> Flow
> !image-2025-02-13-11-54-45-830.png|width=1730,height=187!
> Convert to CSV.
> ExcelReader -
> !image-2025-02-13-11-55-19-478.png|width=507,height=329!
> CSVRecordSetWriter
> !image-2025-02-13-11-57-55-829.png|width=505,height=455!
>
> if we dont have attribute ${sheetName} ExcelReader combines all sheets
> together (as expected) however the date stays in milliseconds
> {code:java}
> date,Something,Name
> 1738368000000,test1,Sheet1
> 1707696000000,test2,Sheet1
> 211248000000,aaa,Sheet2
> 540086400000,sss,Sheet2
> {code}
> If you provide sheet name - date formatted correctly
> {code:java}
> date,Something,Name
> 2025-02-01,test1,Sheet1
> 2024-02-12,test2,Sheet1{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)