[ 
https://issues.apache.org/jira/browse/NIFI-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927872#comment-17927872
 ] 

Daniel Stieglitz edited comment on NIFI-14265 at 2/17/25 9:46 PM:
------------------------------------------------------------------

It looks like the difference is where the header row from the second sheet is 
read as data. This skews the schema and hence the output. If you configure the 
CsvRecordSetWriter to write the schema you will see the differences. Where one 
sheet is read, the schema details for the date field  is 
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
while when both sheets are read its
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
note the added "string" type. 

I would have to see if its feasible to skip header rows on both sheets when 
using the "Use Starting Row" schema access strategy.


was (Author: JIRAUSER294662):
It looks like the difference is where the header row from the second sheet is 
read as data. This skews the schema and hence the output. If you configure the 
CsvRecordSetWriter to write the schema you will see the differences. Where one 
sheet is read the schema details for the date field  is 
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"null"]}{code}
while when both sheets are read its
{code:java}
{"name":"date","type":[{"type":"int","logicalType":"date"},"string","null"]}{code}
note the added "string" type. 

I would have to see if its feasible to indicate header rows on both sheets to 
skip them when using the "Use Starting Row" schema access strategy.

> ExcelReader  - date stays in milliseconds
> -----------------------------------------
>
>                 Key: NIFI-14265
>                 URL: https://issues.apache.org/jira/browse/NIFI-14265
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Philipp Korniets
>            Assignee: Daniel Stieglitz
>            Priority: Major
>         Attachments: Excel Test2.xlsx, image-2025-02-13-11-16-07-111.png, 
> image-2025-02-13-11-54-45-830.png, image-2025-02-13-11-55-19-478.png, 
> image-2025-02-13-11-57-55-829.png
>
>
> Hi
> observing strange behaviour of ExcelReader
> Workbook  example [^Excel Test2.xlsx]
> Flow 
> !image-2025-02-13-11-54-45-830.png|width=1730,height=187!
> Convert to CSV.
> ExcelReader  - 
> !image-2025-02-13-11-55-19-478.png|width=507,height=329!
> CSVRecordSetWriter
> !image-2025-02-13-11-57-55-829.png|width=505,height=455!
>  
> if we dont have attribute ${sheetName} ExcelReader combines all sheets 
> together (as expected) however the date  stays in milliseconds
> {code:java}
> date,Something,Name
> 1738368000000,test1,Sheet1
> 1707696000000,test2,Sheet1
> 211248000000,aaa,Sheet2
> 540086400000,sss,Sheet2
>  {code}
> If you provide sheet name  - date formatted correctly
> {code:java}
> date,Something,Name
> 2025-02-01,test1,Sheet1
> 2024-02-12,test2,Sheet1{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to