[
https://issues.apache.org/jira/browse/NIFI-14106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925006#comment-17925006
]
Daniel Stieglitz edited comment on NIFI-14106 at 2/11/25 5:14 PM:
------------------------------------------------------------------
[~exceptionfactory] I determined the cause of this ticket was as a result of
the changes made in NIFI-13726. Once the cell styles are not copied the
formatting for date/time are lost and a meaning less number remains. I am not
sure how to proceed as this is a bug I discovered and no one is complaining
about it so far. A potential fix I thought of was to to take the current POI
code which I used to copy the rows and retrofit it to at least preserve the
date/time as an epoch in milliseconds. In the meantime I have submitted a
question to the [POI user mailing
list|https://lists.apache.org/[email protected]] seeking help to
see if there is a way to not to exceed the maximum number of cell styles and
yet have the formatting needed for dates. Please let me know if you have any
suggestions how to proceed. Thanks!
was (Author: JIRAUSER294662):
[~exceptionfactory] I determined the cause of this ticket was as a result of
the changes made in NIFI-13726. Once the cell styles are not copied the
formatting for date/time are lost and a meaning less number remains. I am not
sure how to proceed as this is a bug I discovered and no one is complaining
about it so far. A potential fix I thought of was to to take the current POI
code which I used to copy the rows and retrofit it to at least preserve the
date/time as an epoch in milliseconds. Is that something worth pursuing or
perhaps is this something to ask the POI folks to incorporate in their code?
> Loss of dates and times precision when running Excel with Dates through
> SplitExcel
> ----------------------------------------------------------------------------------
>
> Key: NIFI-14106
> URL: https://issues.apache.org/jira/browse/NIFI-14106
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Daniel Stieglitz
> Assignee: Daniel Stieglitz
> Priority: Major
> Attachments: fileWithIssue.xlsx
>
>
> The SplitExcel processor seems to be dropping date format information needed
> in order to correctly determine dates and times. I set up two flows sending
> the attached file to both:
> +Flow without SplitExcel+
> GetFile -> ConvertRecord (configured with ExcelReader with Schema Access
> Strategy 'Use Starting Row' and CSVRecordSetWriter (configured with Schema
> Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy, Time Format
> HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> +Flow with SplitExcel+
> GetFile -> SplitExcel -> ConvertRecord (configured with ExcelReader with
> Schema Access Strategy 'Use Starting Row' and CSVRecordSetWriter (configured
> with Schema Access Strategy 'Inherit Record Schema', Date Format MM/dd/yyyy,
> Time Format HH:mm:ss and Timestamp Format MM/dd/yyyy'T'hh:mm:ss)
> The contents from the flow without SplitExcel correctly outputs the dates and
> times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,01/01/2023,09:53:44
> 78,01/01/2023,09:55:16
> 80,01/01/2023,10:00:39
> 81,01/01/2023,10:03:55
> 82,01/01/2023,10:14:49{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":[{"type":"int","logicalType":"date"},"null"]},{"name":"transaction_time","type":[{"type":"int","logicalType":"time-millis"},"null"]}]}{code}
> while the contents from the flow with SplitExcel does not correctly output
> the dates and times
> {code:java}
> transaction_id,transaction_date,transaction_time
> 75,44927,-1
> 78,44927,-1
> 80,44927,-1
> 81,44927,-1
> 82,44927,-1{code}
> and its schema is
> {code:java}
> {"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"transaction_id","type":["long","null"]},{"name":"transaction_date","type":["long","null"]},{"name":"transaction_time","type":["long","null"]}]}{code}
> Even if the schema for the SplitExcel flow is specified as the one produced
> by flow without the SplitExcel processor and the header from the attached
> file is removed, the results are not the correct date and times
> {code:java}
> 75,01/01/1970,23:59:59
> 78,01/01/1970,23:59:59
> 80,01/01/1970,23:59:59
> 81,01/01/1970,23:59:59
> 82,01/01/1970,23:59:59{code}
> It would seem when copying the contents from the original tab into a new
> Excel spreadsheet (with one tab) that for all dates/times/timestamps the
> correct epoch must be copied over.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)