[ 
https://issues.apache.org/jira/browse/NIFI-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890204#comment-17890204
 ] 

ASF subversion and git services commented on NIFI-13744:
--------------------------------------------------------

Commit f4ee11975eb051894dd75f99efde931f023fb812 in nifi's branch 
refs/heads/support/nifi-1.x from David Handermann
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=f4ee11975e ]

NIFI-13744 Corrected Excel Reader Cell Type Inferencing

- Added Cell Field Type Reader with Timestamp detection

This closes #9397


> ExcelReader time conversion issues
> ----------------------------------
>
>                 Key: NIFI-13744
>                 URL: https://issues.apache.org/jira/browse/NIFI-13744
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.27.0, 2.0.0-M4
>         Environment: RHEL 8.10, macOS Sequoia 15
>            Reporter: John Wise
>            Assignee: Daniel Stieglitz
>            Priority: Major
>              Labels: Bug, Reader
>             Fix For: 1.28.0, 2.0.0-M5
>
>         Attachments: Excel_Reader_-_Time_Conversion_Issues.xml, test.xlsx
>
>          Time Spent: 23h 50m
>  Remaining Estimate: 0h
>
> [~dstiegli1] - There are actually two issues with time conversion in 
> ExcelReader.  I'm using the following sample set of data to illustrate both 
> issues:
> {code:java}
> Date_Standard: "07/24/24",
> Date_Custom: "07/25/24",
> Time_Standard: "4:00:00 PM",
> Time_Custom: "16:00:00",
> Timestamp: "07/26/24 16:00:00"{code}
> 1. When inferring the schema, all of the Avro types are incorrectly 
> identified as strings:
> {code:java}
> {
>   "type":"record",
>   "name":"nifiRecord",
>   "namespace":"org.apache.nifi",
>   "fields":[
>     {"name":"column_0","type":["string","null"]},
>     {"name":"column_1","type":["string","null"]},
>     {"name":"column_2","type":["string","null"]},
>     {"name":"column_3","type":["string","null"]},
>     {"name":"column_4","type":["string","null"]}
>   ]
> }{code}
> The output consists solely of epoch strings; the fourth value, 
> {{{}"-2208999600000"{}}}, appears to be incorrect, since it isn't inline with 
> the other values at all.
> *Inferred output:*
> {code:java}
> [ {
>   "column_0" : "1721793600000",
>   "column_1" : "1721880000000",
>   "column_2" : "1721851200000",
>   "column_3" : "-2208999600000",
>   "column_4" : "1722024000000"
> } ]{code}
>  
> 2. The second issue occurs when the Avro schema is provided:
> {code:java}
> {
>   "type": "record",
>   "name": "nifiRecord",
>   "namespace": "org.apache.nifi",
>   "fields": [
>     {
>       "name": "Date_Standard",
>       "type": [ "null", { "type": "int", "logicalType": "date" } ]
>     },
>     {
>       "name": "Date_Custom",
>       "type": [ "null", { "type": "int", "logicalType": "date" } ]
>     },
>     {
>       "name": "Time_Standard",
>       "type": [ "null", { "type": "int", "logicalType": "time-millis" } ]
>     },
>     {
>       "name": "Time_Custom",
>       "type": [ "null", { "type": "long", "logicalType": "time-micros" } ]
>     },
>     {
>       "name": "Timestamp",
>       "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" } 
> ]
>     }
>   ]
> } 
> {code}
> Conversion of the {{time-millis}} and {{time-micros}} fields both fail with 
> errors similar to this:
> {code:java}
> • 18:01:24 EDT ERROR
> ConvertRecord[id=a098dabc-0191-1000-6d17-3aaa911b2130] Failed to process 
> FlowFile [filename=test.xIsx]; will route to failure: 
> org.apache.nifi.processor.exception.ProcessException: Could not parse 
> incoming data
> - Caused by: org.apache.nifi.serialization.MalformedRecordException: Read 
> next Record from Excel XLSX failed
> - Caused by: 
> org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
> Cannot convert value [Sun Dec 31 16:00:00 EST 1899] of type class 
> java.util.Date to Time for field Time_Custom
> {code}
> Changing the failed types to "string" results in order to allow conversion of 
> the other values results in an epoch output for those values:
> {code:java}
> [ {
>   "Date_Standard" : "07/24/2024",
>   "Date_Custom" : "07/25/2024",
>   "Time_Standard" : "1721851200000",
>   "Time_Custom" : "-2208999600000",
>   "Timestamp" : "07/26/2024 16:00:00"
> } ]{code}
>  
> Given the same data in both JSON and CSV formats, with corresponding Readers, 
> both the inferred and schema-provided outputs are as expected.  This appears 
> to be an issue in ExcelReader.
> I've attached the spreadsheet & a template of the NiFi flow that I've been 
> troubleshooting this with.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to