[ 
https://issues.apache.org/jira/browse/NIFI-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjorn Olsen updated NIFI-4182:
------------------------------
    Description: 
Type Coercion rules currently allow for the following conversions regarding 
Date, Time and Timestamp fields:

* Any "date/time" type (Date, Time, Timestamp) can be coerced into any other 
"date/time" type.
* Any "date/time" type can be coerced into a Long type, representing the number 
of milliseconds since epoch (Midnight GMT, January 1, 1970).
* Any "date/time" type can be coerced into a String. 

In the case of Avro, the type of "int" and logicalType of "date" is stored as 
an integer representing the number of days since 01 Jan 1970. This works with 
the AvroRecordSetWriter, but not with the CSVRecordSetWriter or 
JSONRecordSetWriter.
Thus it is inconsistent with the rules outlined above.

Consider a date of 2017-01-11 (11th Jan 2017) and write Avro schema of:
{ "name": "MY_DATE" , "type": { "type":"int", "logicalType":"date"} }

This is stored as follows:
Avro: 17177
CSV: 1484092800000
JSON: 1484092800000

It appears in the latter 2 cases that the schema specification is ignored. 
The data is stored as a Long value even though an Int was specified in the 
"type" attribute of the schema.

The same reasoning applies for the time-millis and time-micros Avro annotated 
logicalTypes which are stored as Int in the Avro standard. 

Changing this default to align with the Avro standard for logicalTypes, may 
break existing implementations. 
Certainly it should be changed for the case when the output schema explicitly 
asks for an Int output type (or at the very least, fail to do the type coercion 
from Long to Int).

Test flow is attached. The problem is replicated by changing through the 
various RecordSetWriter controllers on the ConvertRecord processor an observing 
the output flowfile content in each case.

  was:
Type Coercion rules currently allow for the following conversions regarding 
Date, Time and Timestamp fields:

* Any "date/time" type (Date, Time, Timestamp) can be coerced into any other 
"date/time" type.
* Any "date/time" type can be coerced into a Long type, representing the number 
of milliseconds since epoch (Midnight GMT, January 1, 1970).
* Any "date/time" type can be coerced into a String. 

In the case of Avro, the type of "int" and logicalType of "date" is stored as 
an integer representing the number of days since 01 Jan 1970. This works 
consistently with the AvroRecordSetWriter, but not with the CSVRecordSetWriter 
or JSONRecordSetWriter.

Consider a date of 2017-01-11 (11th Jan 2017) and write Avro schema of:
{ "name": "MY_DATE" , "type": { "type":"int", "logicalType":"date"} }

This is stored as follows:
Avro: 17177
CSV: 1484092800000
JSON: 1484092800000

It appears in the latter 2 cases that the schema specification is ignored. 
The data is stored as a Long value even though an Int was specified in the 
"type" attribute of the schema.

The same reasoning applies for the time-millis and time-micros Avro annotated 
logicalTypes which are stored as Int in the Avro standard. 

Changing this default to align with the Avro standard for logicalTypes, may 
break existing implementations. 
Certainly it should be changed for the case when the output schema explicitly 
asks for an Int output type (or at the very least, fail to do the type coercion 
from Long to Int).

Test flow is attached. The problem is replicated by changing through the 
various RecordSetWriter controllers on the ConvertRecord processor an observing 
the output flowfile content in each case.


> Inconsistent Type Coercion for Date and Time field types
> --------------------------------------------------------
>
>                 Key: NIFI-4182
>                 URL: https://issues.apache.org/jira/browse/NIFI-4182
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Bjorn Olsen
>            Priority: Minor
>         Attachments: Field_Conversion_Test_1.xml
>
>
> Type Coercion rules currently allow for the following conversions regarding 
> Date, Time and Timestamp fields:
> * Any "date/time" type (Date, Time, Timestamp) can be coerced into any other 
> "date/time" type.
> * Any "date/time" type can be coerced into a Long type, representing the 
> number of milliseconds since epoch (Midnight GMT, January 1, 1970).
> * Any "date/time" type can be coerced into a String. 
> In the case of Avro, the type of "int" and logicalType of "date" is stored as 
> an integer representing the number of days since 01 Jan 1970. This works with 
> the AvroRecordSetWriter, but not with the CSVRecordSetWriter or 
> JSONRecordSetWriter.
> Thus it is inconsistent with the rules outlined above.
> Consider a date of 2017-01-11 (11th Jan 2017) and write Avro schema of:
> { "name": "MY_DATE" , "type": { "type":"int", "logicalType":"date"} }
> This is stored as follows:
> Avro: 17177
> CSV: 1484092800000
> JSON: 1484092800000
> It appears in the latter 2 cases that the schema specification is ignored. 
> The data is stored as a Long value even though an Int was specified in the 
> "type" attribute of the schema.
> The same reasoning applies for the time-millis and time-micros Avro annotated 
> logicalTypes which are stored as Int in the Avro standard. 
> Changing this default to align with the Avro standard for logicalTypes, may 
> break existing implementations. 
> Certainly it should be changed for the case when the output schema explicitly 
> asks for an Int output type (or at the very least, fail to do the type 
> coercion from Long to Int).
> Test flow is attached. The problem is replicated by changing through the 
> various RecordSetWriter controllers on the ConvertRecord processor an 
> observing the output flowfile content in each case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to