[ 
https://issues.apache.org/jira/browse/NIFI-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927872#comment-17927872
 ] 

Daniel Stieglitz edited comment on NIFI-14265 at 2/17/25 10:33 PM:
-------------------------------------------------------------------

It looks like the difference is where the header row from the second sheet is 
read as data. This skews the schema and hence the output. If you configure the 
CsvRecordSetWriter to write the schema you will see the differences. Where one 
sheet is read, the schema details for the date field  is 
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
which is a pure date hence the Date Format specified in the CsvRecordSetWriter 
is used while when both sheets are read its:
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
Note the added "string" type. That extra string type I believe causes the field 
not to be pure date hence the Date Format specified in the CsvRecordSetWriter 
is not used on it.


was (Author: JIRAUSER294662):
It looks like the difference is where the header row from the second sheet is 
read as data. This skews the schema and hence the output. If you configure the 
CsvRecordSetWriter to write the schema you will see the differences. Where one 
sheet is read, the schema details for the date field  is 
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
which is a pure date hence the Date Format specified in the CsvRecordSetWriter 
is used while when both sheets are read its:
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
Note the added "string" type. That extra string type I believe causes the field 
not to be pure date hence the Date Format specified in the CsvRecordSetWriter 
is not used on it.

I would have to see if its feasible to skip header rows on both sheets when 
using the "Use Starting Row" schema access strategy thereby allowing for the 
schema to understand this field is only a date..

> ExcelReader  - date stays in milliseconds
> -----------------------------------------
>
>                 Key: NIFI-14265
>                 URL: https://issues.apache.org/jira/browse/NIFI-14265
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Philipp Korniets
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: Excel Test2.xlsx, image-2025-02-13-11-16-07-111.png, 
> image-2025-02-13-11-54-45-830.png, image-2025-02-13-11-55-19-478.png, 
> image-2025-02-13-11-57-55-829.png
>
>
> Hi
> observing strange behaviour of ExcelReader
> Workbook  example [^Excel Test2.xlsx]
> Flow 
> !image-2025-02-13-11-54-45-830.png|width=1730,height=187!
> Convert to CSV.
> ExcelReader  - 
> !image-2025-02-13-11-55-19-478.png|width=507,height=329!
> CSVRecordSetWriter
> !image-2025-02-13-11-57-55-829.png|width=505,height=455!
>  
> if we dont have attribute ${sheetName} ExcelReader combines all sheets 
> together (as expected) however the date  stays in milliseconds
> {code:java}
> date,Something,Name
> 1738368000000,test1,Sheet1
> 1707696000000,test2,Sheet1
> 211248000000,aaa,Sheet2
> 540086400000,sss,Sheet2
>  {code}
> If you provide sheet name  - date formatted correctly
> {code:java}
> date,Something,Name
> 2025-02-01,test1,Sheet1
> 2024-02-12,test2,Sheet1{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to