[
https://issues.apache.org/jira/browse/NIFI-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545307#comment-15545307
]
Matt Burgess commented on NIFI-2448:
------------------------------------
[~tom_dom] All the Hive processors are in a single NiFi ARchive (aka NAR) so
they can share Hadoop/Hive dependencies. In order to support more Hive
deployments in the field, this Jira was raised to downgrade the Hive NAR (which
includes PutHiveStreaming, the two HiveQL processors, and ConvertAvroToORC) to
Apache Hive 1.2.1. However there are still versions of Hive/Hadoop that are not
fully compatible with the NiFi 1.0.0 Hive processors, notably Hortonworks Data
Platform 2.5 and your Cloudera instance (which is based on Hive 1.1 but I don't
believe is the Apache Hive 1.1 release, rather it has a 1.1 baseline with other
things added).
Getting the Hadoop and Hive processors in lockstep for greatest compatibility
has been a challenge, and solutions have been offered/proposed (such as
NIFI-710, NIFI-2026, and NIFI-2828). With the latter two, a workaround is to
build your own Hive NAR against a chosen version of Hadoop/Hive. For example,
to build the Hive NAR for HDP 2.5 compatibility:
mvn clean install -Phortonworks -Dhive.version=1.2.1000.2.5.0.0-1245
-Dhadoop.version=2.7.3.2.5.0.0-1245
However for Cloudera 5.5.2, the baseline version of Hive 1.1 is even older than
the NiFi baseline (Apache NiFi 1.2.1), and as a result does not have some of
the classes/fixes needed by the Hive NAR code (such as the ConvertAvroToORC
processor). In order to create a Hive NAR for to support that version, you
would need to remove the ConvertAvroToORC processor (and its auxiliary classes)
from the code, then build with something like:
mvn clean install -Pcloudera -Dhive.version=1.1.0-cdh5.5.2
-Dhadoop.version=2.6.0-cdh5.5.2
A different solution might be to create a hive-libraries-nar and split up the
processors into separate NARs, but that's a lot of overhead just to support
various distros that have their own versions of Hadoop/Hive, especially when
the above technique can be used to build the right NAR for the right vendor
platform/version. Out of the box, Apache NiFi 1.0.0 works with Apache Hadoop
2.6.2 and Apache Hive 1.2.1. When incompatibilities arise, they are almost
always due to extra "features" being in vendor deployments that aren't in the
corresponding Apache baseline.
Hopefully this is just an awkward transition phase for these project(s), and
with any luck we can soon upgrade NiFi to use a newer Hadoop, Hive 2.x, and
Apache ORC, the combination of which should give a better level of
compatibility than we're seeing at present.
> Hive Processors depend on too recent a Hive version
> ---------------------------------------------------
>
> Key: NIFI-2448
> URL: https://issues.apache.org/jira/browse/NIFI-2448
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.0.0
> Reporter: Simon Elliston Ball
> Priority: Critical
>
> The new Hive bundle depends on version 2.0.0 of Hive. This means that it can
> only connect to very recent Hive distributions.
> Sadly very few people in the field have upgraded their Hive to the latest and
> greatest, and as per https://issues.apache.org/jira/browse/HIVE-6050 the
> issue of backward compatibility in the client is still not resolved.
> We should look at lowering the dependency version to allow connections with
> older Hive distros.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)