[jira] [Commented] (NIFI-981) Add support for Hive JDBC / ExecuteSQL

Matt Hutton (JIRA) Sat, 07 Nov 2015 00:51:20 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995125#comment-14995125
 ]


Matt Hutton commented on NIFI-981:
----------------------------------

The fix will involve more than just correct bundling of Hive jars.  There are  
code assumptions about behavior of the Hive JDBC driver that cause errors 
within ExecuteSQL and JDBCommon:

1) java.sql.Statement.setQueryTimeout  - unsupported in Hive version tested
2) ResultSetMetaData.getColumnName - JDBCCommon fails because the column name 
is unexpectedly qualified as in <table>.<column name>
3) ResultSetMetaData.getTableName - unsupported method error

Comments above relate to the following hive driver:
        <dependency> 
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>0.14.0</version>
        </dependency>

Brute force local fix: 1) created a similar class to ExecuteSQL 2) decorated 
HiveResultSet and HiveMetaData to override problematic methods and return valid 
values expected by JDBCommon. 3) Catch and log warning if setQueryTimeout is 
invoked but not supported.

The test case for ExecuteSQL assumes Derby which isn't necessarily 
representative of all SQL drivers.  Would be great to have an integration test 
suite using this generic SQL processor against common databases (MySQL, 
Postgres, Hive, Spark, Teradata, Oracle, etc)

> Add support for Hive JDBC / ExecuteSQL
> --------------------------------------
>
>                 Key: NIFI-981
>                 URL: https://issues.apache.org/jira/browse/NIFI-981
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Joseph Witt
>            Assignee: Oleg Zhurakousky
>             Fix For: 1.0.0
>
>
> In this mailing list thread from September 2015 "NIFI DBCP connection pool 
> not working for hive" the main thrust of the converstation is to provide 
> proper support for delivering data to hive.  Hive's jdbc driver appears to 
> have dependencies on Hadoop libraries.  We need to be careful/thoughtful 
> about how to best support this so that different versions of Hadoop distros 
> can be supported (potentially in parallel on the same flow).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-981) Add support for Hive JDBC / ExecuteSQL

Reply via email to