[jira] [Commented] (IMPALA-7131) Support external data sources in local catalog mode

ASF subversion and git services (Jira) Tue, 10 Oct 2023 15:32:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773839#comment-17773839
 ]


ASF subversion and git services commented on IMPALA-7131:
---------------------------------------------------------

Commit c2bd30a1b3b49ccc7770ec5ab4adeb0b75f40240 in impala's branch 
refs/heads/master from Fucun Chu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c2bd30a1b ]

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
    are executed on coordinator.
  - Queries which read following data types from external JDBC tables
    are not supported:
    BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
    <, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
    DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
    MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou <[email protected]>
Reviewed-by: Kurt Deschler <[email protected]>
Reviewed-by: Riza Suminto <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Support external data sources in local catalog mode
> ---------------------------------------------------
>
>                 Key: IMPALA-7131
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7131
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog, Frontend
>            Reporter: Todd Lipcon
>            Assignee: Wenzhe Zhou
>            Priority: Minor
>              Labels: catalog-v2
>
> Currently it seems that external data sources are not persisted except in 
> memory on the catalogd. This means that it will be somewhat more difficult to 
> support this feature in the design of impalad without a catalogd.
> This JIRA is to eventually figure out a way to support this feature -- either 
> by supporting in-memory on a per-impalad basis, or perhaps by figuring out a 
> way to register them persistently in a file system directory, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-7131) Support external data sources in local catalog mode

Reply via email to