[ 
https://issues.apache.org/jira/browse/NIFI-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890171#comment-17890171
 ] 

Daniel Stieglitz commented on NIFI-13744:
-----------------------------------------

[~john.wise] There has been a better fix by [~exceptionfactory] to handle your 
use case. With his fix as long as the Excel format for that column is a date 
format, the ExcelReader will infer all dates without having to specify any 
date, time or timestamp patterns. It is only necessary to specify date, time or 
timestamp patterns when the column is in an Excel text format.

> ExcelReader time conversion issues
> ----------------------------------
>
>                 Key: NIFI-13744
>                 URL: https://issues.apache.org/jira/browse/NIFI-13744
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.27.0, 2.0.0-M4
>         Environment: RHEL 8.10, macOS Sequoia 15
>            Reporter: John Wise
>            Assignee: Daniel Stieglitz
>            Priority: Major
>              Labels: Bug, Reader
>             Fix For: 1.28.0, 2.0.0-M5
>
>         Attachments: Excel_Reader_-_Time_Conversion_Issues.xml, test.xlsx
>
>          Time Spent: 23h 50m
>  Remaining Estimate: 0h
>
> [~dstiegli1] - There are actually two issues with time conversion in 
> ExcelReader.  I'm using the following sample set of data to illustrate both 
> issues:
> {code:java}
> Date_Standard: "07/24/24",
> Date_Custom: "07/25/24",
> Time_Standard: "4:00:00 PM",
> Time_Custom: "16:00:00",
> Timestamp: "07/26/24 16:00:00"{code}
> 1. When inferring the schema, all of the Avro types are incorrectly 
> identified as strings:
> {code:java}
> {
>   "type":"record",
>   "name":"nifiRecord",
>   "namespace":"org.apache.nifi",
>   "fields":[
>     {"name":"column_0","type":["string","null"]},
>     {"name":"column_1","type":["string","null"]},
>     {"name":"column_2","type":["string","null"]},
>     {"name":"column_3","type":["string","null"]},
>     {"name":"column_4","type":["string","null"]}
>   ]
> }{code}
> The output consists solely of epoch strings; the fourth value, 
> {{{}"-2208999600000"{}}}, appears to be incorrect, since it isn't inline with 
> the other values at all.
> *Inferred output:*
> {code:java}
> [ {
>   "column_0" : "1721793600000",
>   "column_1" : "1721880000000",
>   "column_2" : "1721851200000",
>   "column_3" : "-2208999600000",
>   "column_4" : "1722024000000"
> } ]{code}
>  
> 2. The second issue occurs when the Avro schema is provided:
> {code:java}
> {
>   "type": "record",
>   "name": "nifiRecord",
>   "namespace": "org.apache.nifi",
>   "fields": [
>     {
>       "name": "Date_Standard",
>       "type": [ "null", { "type": "int", "logicalType": "date" } ]
>     },
>     {
>       "name": "Date_Custom",
>       "type": [ "null", { "type": "int", "logicalType": "date" } ]
>     },
>     {
>       "name": "Time_Standard",
>       "type": [ "null", { "type": "int", "logicalType": "time-millis" } ]
>     },
>     {
>       "name": "Time_Custom",
>       "type": [ "null", { "type": "long", "logicalType": "time-micros" } ]
>     },
>     {
>       "name": "Timestamp",
>       "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } 
> ]
>     }
>   ]
> } 
> {code}
> Conversion of the {{time-millis}} and {{time-micros}} fields both fail with 
> errors similar to this:
> {code:java}
> • 18:01:24 EDT ERROR
> ConvertRecord[id=a098dabc-0191-1000-6d17-3aaa911b2130] Failed to process 
> FlowFile [filename=test.xIsx]; will route to failure: 
> org.apache.nifi.processor.exception.ProcessException: Could not parse 
> incoming data
> - Caused by: org.apache.nifi.serialization.MalformedRecordException: Read 
> next Record from Excel XLSX failed
> - Caused by: 
> org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
> Cannot convert value [Sun Dec 31 16:00:00 EST 1899] of type class 
> java.util.Date to Time for field Time_Custom
> {code}
> Changing the failed types to "string" results in order to allow conversion of 
> the other values results in an epoch output for those values:
> {code:java}
> [ {
>   "Date_Standard" : "07/24/2024",
>   "Date_Custom" : "07/25/2024",
>   "Time_Standard" : "1721851200000",
>   "Time_Custom" : "-2208999600000",
>   "Timestamp" : "07/26/2024 16:00:00"
> } ]{code}
>  
> Given the same data in both JSON and CSV formats, with corresponding Readers, 
> both the inferred and schema-provided outputs are as expected.  This appears 
> to be an issue in ExcelReader.
> I've attached the spreadsheet & a template of the NiFi flow that I've been 
> troubleshooting this with.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to