Aligning DATE Field Handling in NiFi and Azure Data Factory

Eduardo Fontes Sat, 28 Sep 2024 19:27:55 -0700

Hi all,

I am encountering an issue with the handling of DATE columns when using
ExecuteSQLRecord to query an Oracle database and writing the result set as
Parquet. The DATE columns are being converted to UTC (since Oracle stores
date and time in DATE type fields).


While this conversion is generally acceptable, I have observed a different
behavior in Azure Data Factory (ADF). In ADF, DATE columns are written to
Parquet without any timezone conversions.

This discrepancy is problematic for my system because I have a pipeline in
ADF and a flow in NiFi that perform the same tasks, where one serves as a
backup for the other. It is crucial that the Parquet files generated by
NiFi and ADF are identical.

Upon examining NiFi's source code, I discovered that the JdbcCommon.java
convertToAvroStream function might be relevant, but it appears that
ParquetRecordSetWriter does not utilize it.

Does anyone know how to configure a NiFi flow to match ADF's behavior
regarding DATE fields without making global, static changes to all NiFi
flows?

Alternatively, could anyone provide hints on how to create a custom
ExecuteSQLRecord or ParquetRecordSetWriter that does not convert timezone?

Thanks in advance.

Best regards,

Eduardo Fontes

Aligning DATE Field Handling in NiFi and Azure Data Factory

Reply via email to