[ 
https://issues.apache.org/jira/browse/NIFI-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545307#comment-15545307
 ] 

Matt Burgess commented on NIFI-2448:
------------------------------------

[~tom_dom] All the Hive processors are in a single NiFi ARchive (aka NAR) so 
they can share Hadoop/Hive dependencies. In order to support more Hive 
deployments in the field, this Jira was raised to downgrade the Hive NAR (which 
includes PutHiveStreaming, the two HiveQL processors, and ConvertAvroToORC) to 
Apache Hive 1.2.1. However there are still versions of Hive/Hadoop that are not 
fully compatible with the NiFi 1.0.0 Hive processors, notably Hortonworks Data 
Platform 2.5 and your Cloudera instance (which is based on Hive 1.1 but I don't 
believe is the Apache Hive 1.1 release, rather it has a 1.1 baseline with other 
things added).

Getting the Hadoop and Hive processors in lockstep for greatest compatibility 
has been a challenge, and solutions have been offered/proposed (such as 
NIFI-710, NIFI-2026, and NIFI-2828). With the latter two, a workaround is to 
build your own Hive NAR against a chosen version of Hadoop/Hive. For example, 
to build the Hive NAR for HDP 2.5 compatibility:

mvn clean install -Phortonworks -Dhive.version=1.2.1000.2.5.0.0-1245 
-Dhadoop.version=2.7.3.2.5.0.0-1245

However for Cloudera 5.5.2, the baseline version of Hive 1.1 is even older than 
the NiFi baseline (Apache NiFi 1.2.1), and as a result does not have some of 
the classes/fixes needed by the Hive NAR code (such as the ConvertAvroToORC 
processor). In order to create a Hive NAR for to support that version, you 
would need to remove the ConvertAvroToORC processor (and its auxiliary classes) 
from the code, then build with something like:

mvn clean install -Pcloudera -Dhive.version=1.1.0-cdh5.5.2 
-Dhadoop.version=2.6.0-cdh5.5.2

A different solution might be to create a hive-libraries-nar and split up the 
processors into separate NARs, but that's a lot of overhead just to support 
various distros that have their own versions of Hadoop/Hive, especially when 
the above technique can be used to build the right NAR for the right vendor 
platform/version. Out of the box, Apache NiFi 1.0.0 works with Apache Hadoop 
2.6.2 and Apache Hive 1.2.1. When incompatibilities arise, they are almost 
always due to extra "features" being in vendor deployments that aren't in the 
corresponding Apache baseline.

Hopefully this is just an awkward transition phase for these project(s), and 
with any luck we can soon upgrade NiFi to use a newer Hadoop, Hive 2.x, and 
Apache ORC, the combination of which should give a better level of 
compatibility than we're seeing at present.


> Hive Processors depend on too recent a Hive version
> ---------------------------------------------------
>
>                 Key: NIFI-2448
>                 URL: https://issues.apache.org/jira/browse/NIFI-2448
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Simon Elliston Ball
>            Priority: Critical
>
> The new Hive bundle depends on version 2.0.0 of Hive. This means that it can 
> only connect to very recent Hive distributions. 
> Sadly very few people in the field have upgraded their Hive to the latest and 
> greatest, and as per https://issues.apache.org/jira/browse/HIVE-6050 the 
> issue of backward compatibility in the client is still not resolved.
> We should look at lowering the dependency version to allow connections with 
> older Hive distros.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to