[ 
https://issues.apache.org/jira/browse/NIFI-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929236#comment-17929236
 ] 

Daniel Stieglitz commented on NIFI-14265:
-----------------------------------------

[~exceptionfactory] I believe I understand why the bug being reported here is 
happening. The ExcelHeaderSchemaStrategy takes into account the "first row" in 
subsequent sheets when inferring  the schema  while the ExcelRecordReader does 
not take into account  the "first row" in subsequent sheets. Hence in the 
reported example when both sheets are read, the schema considers the date 
column to be date or a string while the actual record data does not have the 
"first row" from the second sheet. I just want to confirm that the expected 
behavior for ExcelReader is to consider the configured first row to apply 
across all sheets in a workbook. Do we want to consider the ability to 
configure different starting rows depending on the sheet? Understandably this 
would add complexity. Or do we just want to document that the first row would 
apply across multiple sheets?

> ExcelReader  - date stays in milliseconds
> -----------------------------------------
>
>                 Key: NIFI-14265
>                 URL: https://issues.apache.org/jira/browse/NIFI-14265
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Philipp Korniets
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: Excel Test2.xlsx, image-2025-02-13-11-16-07-111.png, 
> image-2025-02-13-11-54-45-830.png, image-2025-02-13-11-55-19-478.png, 
> image-2025-02-13-11-57-55-829.png
>
>
> Hi
> observing strange behaviour of ExcelReader
> Workbook  example [^Excel Test2.xlsx]
> Flow 
> !image-2025-02-13-11-54-45-830.png|width=1730,height=187!
> Convert to CSV.
> ExcelReader  - 
> !image-2025-02-13-11-55-19-478.png|width=507,height=329!
> CSVRecordSetWriter
> !image-2025-02-13-11-57-55-829.png|width=505,height=455!
>  
> if we dont have attribute ${sheetName} ExcelReader combines all sheets 
> together (as expected) however the date  stays in milliseconds
> {code:java}
> date,Something,Name
> 1738368000000,test1,Sheet1
> 1707696000000,test2,Sheet1
> 211248000000,aaa,Sheet2
> 540086400000,sss,Sheet2
>  {code}
> If you provide sheet name  - date formatted correctly
> {code:java}
> date,Something,Name
> 2025-02-01,test1,Sheet1
> 2024-02-12,test2,Sheet1{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to