svenvk created NIFI-15548:
-----------------------------

             Summary: ParquetReader fails on timestamps
                 Key: NIFI-15548
                 URL: https://issues.apache.org/jira/browse/NIFI-15548
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 2.7.2, 2.7.1, 2.7.0
            Reporter: svenvk
         Attachments: image-2026-02-04-12-58-19-172.png, parquet_test.parquet

It seems that since version 2.7 the ParquetReader is having problems with 
timestamps in parquet data. 

When reading a parquet-file with a timestamp column (timestamp_millis), an 
error is thrown:

!image-2026-02-04-12-58-19-172.png!

 
{noformat}
ConvertRecord[id=59968c39-4fb7-324a-b1b7-bd2e4b93ccb7] Failed to process 
FlowFile[filename=parquet_test.parquet]; will route to failure: 
java.lang.ClassCastException: class java.time.Instant cannot be cast to class 
java.lang.Long (java.time.Instant and java.lang.Long are in module java.base of 
loader 'bootstrap')
java.lang.ClassCastException: class java.time.Instant cannot be cast to class 
java.lang.Long (java.time.Instant and java.lang.Long are in module java.base of 
loader 'bootstrap')       at 
org.apache.nifi.avro.AvroTypeUtil.normalizeValue(AvroTypeUtil.java:1175)     at 
org.apache.nifi.avro.AvroTypeUtil.lambda$normalizeValue$3(AvroTypeUtil.java:1186)
    at 
org.apache.nifi.avro.AvroTypeUtil.convertUnionFieldValue(AvroTypeUtil.java:1016)
     at 
org.apache.nifi.avro.AvroTypeUtil.normalizeValue(AvroTypeUtil.java:1186)     at 
org.apache.nifi.avro.AvroTypeUtil.convertAvroRecordToMap(AvroTypeUtil.java:979) 
     at 
org.apache.nifi.avro.AvroTypeUtil.convertAvroRecordToMap(AvroTypeUtil.java:943) 
     at 
org.apache.nifi.parquet.record.ParquetRecordReader.nextRecord(ParquetRecordReader.java:111)
  at 
org.apache.nifi.serialization.RecordReader.nextRecord(RecordReader.java:50)  at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)   at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:251)
    at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler$ProxiedReturnObjectInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:237)
       at jdk.proxy196/jdk.proxy196.$Proxy501.nextRecord(Unknown Source)       
at 
org.apache.nifi.processors.standard.AbstractRecordProcessor.lambda$onTrigger$0(AbstractRecordProcessor.java:132)
     at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:3410)
 at 
org.apache.nifi.processors.standard.AbstractRecordProcessor.onTrigger(AbstractRecordProcessor.java:125)
      at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
     at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1274)
  at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:229)
    at 
org.apache.nifi.controller.scheduling.AbstractTimeBasedSchedulingAgent.lambda$doScheduleOnce$0(AbstractTimeBasedSchedulingAgent.java:59)
     at org.apache.nifi.engine.FlowEngine.lambda$wrap$1(FlowEngine.java:105) at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)   at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
     at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
{noformat}
Steps to reproduce:
 # Take the attached parquet testfile ( [^parquet_test.parquet] ) and let nifi 
take it in (via GetFile, GetHDFS, ....). 
(this testfile was actually produced by an ExecuteSQLRecord processor with 
standard parquet writer)


 # after that, add for example a ConvertRecord processor with a +standaard 
ParquetReader+ and another writer (for exapple Json writer, doesn't really 
matter). 
(but any other processor that can use a parquet reader is having the same issue)


 # run the flow. An error like the one above will be shown. 

We still have an older nifi v1 environment too (v1.23.2) and when trying the 
exact same flow with the same file, it's working perfectly fine. 

I found a similar mention of this problem on the Cloudera community also: 
[https://community.cloudera.com/t5/Support-Questions/Nifi-2-7-x-seems-to-break-ParquetRecordReader-for-timestamps/m-p/413341]

The person there confirms that in v2.6.0 it still worked as expected, but not 
in 2.7.

Reading parquet data is pretty important, so this is a rather blocking issue. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to