[
https://issues.apache.org/jira/browse/NIFI-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890204#comment-17890204
]
ASF subversion and git services commented on NIFI-13744:
--------------------------------------------------------
Commit f4ee11975eb051894dd75f99efde931f023fb812 in nifi's branch
refs/heads/support/nifi-1.x from David Handermann
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=f4ee11975e ]
NIFI-13744 Corrected Excel Reader Cell Type Inferencing
- Added Cell Field Type Reader with Timestamp detection
This closes #9397
> ExcelReader time conversion issues
> ----------------------------------
>
> Key: NIFI-13744
> URL: https://issues.apache.org/jira/browse/NIFI-13744
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.27.0, 2.0.0-M4
> Environment: RHEL 8.10, macOS Sequoia 15
> Reporter: John Wise
> Assignee: Daniel Stieglitz
> Priority: Major
> Labels: Bug, Reader
> Fix For: 1.28.0, 2.0.0-M5
>
> Attachments: Excel_Reader_-_Time_Conversion_Issues.xml, test.xlsx
>
> Time Spent: 23h 50m
> Remaining Estimate: 0h
>
> [~dstiegli1] - There are actually two issues with time conversion in
> ExcelReader. I'm using the following sample set of data to illustrate both
> issues:
> {code:java}
> Date_Standard: "07/24/24",
> Date_Custom: "07/25/24",
> Time_Standard: "4:00:00 PM",
> Time_Custom: "16:00:00",
> Timestamp: "07/26/24 16:00:00"{code}
> 1. When inferring the schema, all of the Avro types are incorrectly
> identified as strings:
> {code:java}
> {
> "type":"record",
> "name":"nifiRecord",
> "namespace":"org.apache.nifi",
> "fields":[
> {"name":"column_0","type":["string","null"]},
> {"name":"column_1","type":["string","null"]},
> {"name":"column_2","type":["string","null"]},
> {"name":"column_3","type":["string","null"]},
> {"name":"column_4","type":["string","null"]}
> ]
> }{code}
> The output consists solely of epoch strings; the fourth value,
> {{{}"-2208999600000"{}}}, appears to be incorrect, since it isn't inline with
> the other values at all.
> *Inferred output:*
> {code:java}
> [ {
> "column_0" : "1721793600000",
> "column_1" : "1721880000000",
> "column_2" : "1721851200000",
> "column_3" : "-2208999600000",
> "column_4" : "1722024000000"
> } ]{code}
>
> 2. The second issue occurs when the Avro schema is provided:
> {code:java}
> {
> "type": "record",
> "name": "nifiRecord",
> "namespace": "org.apache.nifi",
> "fields": [
> {
> "name": "Date_Standard",
> "type": [ "null", { "type": "int", "logicalType": "date" } ]
> },
> {
> "name": "Date_Custom",
> "type": [ "null", { "type": "int", "logicalType": "date" } ]
> },
> {
> "name": "Time_Standard",
> "type": [ "null", { "type": "int", "logicalType": "time-millis" } ]
> },
> {
> "name": "Time_Custom",
> "type": [ "null", { "type": "long", "logicalType": "time-micros" } ]
> },
> {
> "name": "Timestamp",
> "type": [ "null", { "type": "long", "logicalType": "timestamp-millis" }
> ]
> }
> ]
> }
> {code}
> Conversion of the {{time-millis}} and {{time-micros}} fields both fail with
> errors similar to this:
> {code:java}
> • 18:01:24 EDT ERROR
> ConvertRecord[id=a098dabc-0191-1000-6d17-3aaa911b2130] Failed to process
> FlowFile [filename=test.xIsx]; will route to failure:
> org.apache.nifi.processor.exception.ProcessException: Could not parse
> incoming data
> - Caused by: org.apache.nifi.serialization.MalformedRecordException: Read
> next Record from Excel XLSX failed
> - Caused by:
> org.apache.nifi.serialization.record.util.IllegalTypeConversionException:
> Cannot convert value [Sun Dec 31 16:00:00 EST 1899] of type class
> java.util.Date to Time for field Time_Custom
> {code}
> Changing the failed types to "string" results in order to allow conversion of
> the other values results in an epoch output for those values:
> {code:java}
> [ {
> "Date_Standard" : "07/24/2024",
> "Date_Custom" : "07/25/2024",
> "Time_Standard" : "1721851200000",
> "Time_Custom" : "-2208999600000",
> "Timestamp" : "07/26/2024 16:00:00"
> } ]{code}
>
> Given the same data in both JSON and CSV formats, with corresponding Readers,
> both the inferred and schema-provided outputs are as expected. This appears
> to be an issue in ExcelReader.
> I've attached the spreadsheet & a template of the NiFi flow that I've been
> troubleshooting this with.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)