[
https://issues.apache.org/jira/browse/NIFI-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Burgess updated NIFI-5517:
-------------------------------
Description:
NIFI-5475 upgraded the version of Hive 3 to Apache Hive 3.1.0, and some code
changes had to be made as there were new Writable types to be used in place of
the old ones. As a result of the upgrade, NIFI-5491 was discovered and included
some fixes for PutHive3Streaming to support various primitive data types as
well as structs.
However it appears that NIFI-5491 did not cover all the available Hive types,
and NIFI-5475's changes were to PutORC for time/date types, but similar changes
have been made to the Hive writer as well.
PutHive3Streaming should support all available Hive types where prudent (i.e.
where NiFi Record Field Types can be converted to Hive Column Types). The
current list of "top-level" types include:
PRIMITIVE, LIST, MAP, STRUCT, UNION
And the current list of PRIMITIVE types include:
VOID, BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, STRING,
DATE, TIMESTAMP, TIMESTAMPLOCALTZ, BINARY, DECIMAL, VARCHAR, CHAR,
INTERVAL_YEAR_MONTH, INTERVAL_DAY_TIME, UNKNOWN
As of NIFI-5475 I believe PutHive3Streaming supports BOOLEAN, BYTE, SHORT, INT,
LONG, FLOAT, DOUBLE, STRING, VARCHAR, and CHAR for primitive types, and with
respect to the supported primitive types, PutHive3Streaming supports LIST,
STRUCT, and UNION.
The remaining list is MAP, DATE, TIMESTAMP, BINARY, DECIMAL,
INTERVAL_YEAR_MONTH, and INTERVAL_DAY_TIME. Some of these may already been
supported (such as the INTERVALs if the incoming data type is INT or LONG) but
need to be confirmed. VOID and UNKNOWN are used "under the hood" I believe, and
ORC doesn't currently support TIMESTAMPLOCALTZ.
was:
NIFI-5475 upgraded the version of Hive 3 to Apache Hive 3.1.0, and some code
changes had to be made as there were new Writable types to be used in place of
the old ones. As a result of the upgrade, NIFI-5491 was discovered and included
some fixes for PutHive3Streaming to support various primitive data types as
well as structs.
However it appears that NIFI-5491 did not cover all the available Hive types,
and NIFI-5475's changes were to PutORC for time/date types, but similar changes
have been made to the Hive writer as well.
PutHive3Streaming should support all available Hive types where prudent (i.e.
where NiFi Record Field Types can be converted to Hive Column Types). The
current list of "top-level" types include:
PRIMITIVE, LIST, MAP, STRUCT, UNION
And the current list of PRIMITIVE types include:
VOID, BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, STRING,
DATE, TIMESTAMP, TIMESTAMPLOCALTZ, BINARY, DECIMAL, VARCHAR, CHAR,
INTERVAL_YEAR_MONTH, INTERVAL_DAY_TIME, UNKNOWN
As of NIFI-5475 I believe PutHive3Streaming supports BOOLEAN, BYTE, SHORT, INT,
LONG, FLOAT, DOUBLE, STRING, VARCHAR, and CHAR for primitive types (VOID and
UNKNOWN are used "under the hood" I believe), and with respect to the supported
primitive types, PutHive3Streaming supports LIST, STRUCT, and UNION.
The remaining list is MAP, DATE, TIMESTAMP, TIMESTAMPLOCALTZ, BINARY, DECIMAL,
INTERVAL_YEAR_MONTH, and INTERVAL_DAY_TIME. Some of these may already been
supported (such as the INTERVALs if the incoming data type is INT or LONG) but
need to be confirmed.
> PutHive3Streaming does not correctly support all Hive types
> -----------------------------------------------------------
>
> Key: NIFI-5517
> URL: https://issues.apache.org/jira/browse/NIFI-5517
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Priority: Major
>
> NIFI-5475 upgraded the version of Hive 3 to Apache Hive 3.1.0, and some code
> changes had to be made as there were new Writable types to be used in place
> of the old ones. As a result of the upgrade, NIFI-5491 was discovered and
> included some fixes for PutHive3Streaming to support various primitive data
> types as well as structs.
> However it appears that NIFI-5491 did not cover all the available Hive types,
> and NIFI-5475's changes were to PutORC for time/date types, but similar
> changes have been made to the Hive writer as well.
> PutHive3Streaming should support all available Hive types where prudent (i.e.
> where NiFi Record Field Types can be converted to Hive Column Types). The
> current list of "top-level" types include:
> PRIMITIVE, LIST, MAP, STRUCT, UNION
> And the current list of PRIMITIVE types include:
> VOID, BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, STRING,
> DATE, TIMESTAMP, TIMESTAMPLOCALTZ, BINARY, DECIMAL, VARCHAR, CHAR,
> INTERVAL_YEAR_MONTH, INTERVAL_DAY_TIME, UNKNOWN
> As of NIFI-5475 I believe PutHive3Streaming supports BOOLEAN, BYTE, SHORT,
> INT, LONG, FLOAT, DOUBLE, STRING, VARCHAR, and CHAR for primitive types, and
> with respect to the supported primitive types, PutHive3Streaming supports
> LIST, STRUCT, and UNION.
> The remaining list is MAP, DATE, TIMESTAMP, BINARY, DECIMAL,
> INTERVAL_YEAR_MONTH, and INTERVAL_DAY_TIME. Some of these may already been
> supported (such as the INTERVALs if the incoming data type is INT or LONG)
> but need to be confirmed. VOID and UNKNOWN are used "under the hood" I
> believe, and ORC doesn't currently support TIMESTAMPLOCALTZ.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)