Repository: incubator-impala Updated Branches: refs/heads/master 34b5f1c41 -> 05b19bb84
IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database This patch adds documentation and a small utility for users wishing to run the random query generator against Oracle as a reference database. In particular, the patch * points users toward installing Oracle Database. * provides directions on incorporating a Python Oracle client (cx_Oracle) into our impala-python virtual environment. * provides a small utility that will verify installation and connectivity between client and server This documentation isn't meant to be a complete guide. Other patches will add documentation for actually loading random query generator test data into Oracle, for example. Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158 Reviewed-on: http://gerrit.cloudera.org:8080/4095 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Matthew Jacobs <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/05b19bb8 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/05b19bb8 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/05b19bb8 Branch: refs/heads/master Commit: 05b19bb848d5c01f09715aa79c5858d7cca79b8a Parents: 34b5f1c Author: Michael Brown <[email protected]> Authored: Tue Aug 23 09:06:16 2016 -0700 Committer: Matthew Jacobs <[email protected]> Committed: Thu Aug 25 15:20:02 2016 +0000 ---------------------------------------------------------------------- tests/comparison/ORACLE.txt | 90 ++++++++++++++++++++ tests/comparison/README | 14 ++- .../comparison/util/verify-oracle-connection.py | 63 ++++++++++++++ 3 files changed, 166 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/05b19bb8/tests/comparison/ORACLE.txt ---------------------------------------------------------------------- diff --git a/tests/comparison/ORACLE.txt b/tests/comparison/ORACLE.txt new file mode 100644 index 0000000..bc91dbb --- /dev/null +++ b/tests/comparison/ORACLE.txt @@ -0,0 +1,90 @@ +Contents + +I. Note +II. Oracle Database server setup +III. Oracle client setup +IV. Oracle connectivity, cx_Oracle installation +TODO: config/setup +TODO: data load/migrate + +I. Note + +Due to licensing, we cannot distribute Oracle client binaries, and of +course we will not distribute Oracle Database. Please refer to these +guidelines if you need to use Oracle as a reference database for the +Random Query Generator. + + +II. Oracle Database server setup + +There are lots of choices for Oracle Database. What follows are two +suggestions: + +http://www.oracle.com/technetwork/database/database-technologies/express-edition/downloads/index.html + +https://hub.docker.com/r/wnameless/oracle-xe-11g/ + +Follow their instructions for installation and setup. If you can use +sqlplus or some other shell to successfully "select 1 from dual;", then +you're good to go here. + + +III. Oracle client setup + +Use cx_Oracle. + +http://cx-oracle.sourceforge.net/ + +Note: Installing this via pip or similar means won't work until you have +first installed prerequisites. + +1. Install Oracle instant-client + +http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html + +After you choose your architecture, you need + + 1. the basic client + 2. the SDK + +You will likely have to register on the Oracle site to download the +instant client binaries. As of this writing, install instructions are at +the bottom of the page once you make your architecture choice. + +2. Additional dependent packages in distribution + +It's possible either when trying to install cx_Oracle, import it, or use +it, you will get some linker error. I cannot predict what libraries you +do and don't have. Here's a list of Linux distributions and known +package dependencies: + +Ubuntu 14.04.2 LTS +libaio1 + +3. Ensure impala-python is working. It may be enough to simply run: + + $ impala-python + + Getting it working is out of scope of this document. + +4. Run these commands to install cx_Oracle into the impala-python + virtual environment: + + $ source "${IMPALA_HOME}"/infra/python/env/bin/activate + (env)$ pip install cx_Oracle==5.2.1 + (env)$ deactivate + +If pip install fails, double check the following: + * ORACLE_HOME environment variable + * LD_LIBRARY_PATH environment variable + +BUILD.txt packaged with cx_Oracle has more instructions for installing +cx_Oracle. + + +IV. Oracle connectivity, cx_Oracle installation + +Verify cx_Oracle can connect to your Oracle Database. Edit appopriately +and then run: + + $ "${IMPALA_HOME}"/tests/comparison/util/verify-oracle-connection.py http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/05b19bb8/tests/comparison/README ---------------------------------------------------------------------- diff --git a/tests/comparison/README b/tests/comparison/README index acb85b8..579a8cf 100644 --- a/tests/comparison/README +++ b/tests/comparison/README @@ -22,6 +22,9 @@ Requirements: sudo apt-get install python-mysqldb sudo apt-get install python-psycopg2 # Postgresql +Please see the supplemental ORACLE.txt on setting up Oracle as a reference +database. + Usage: @@ -74,7 +77,9 @@ Miscellaneous: Postgresql: -1) Supports basically all Impala language features +1) Supports basically all Impala language features. Exceptions include: + + a) IGNORE NULLS clause for analytic functions 2) Has strange sorting of strings, '-1' > '1'. This may be important if ORDER BY is ever used. The databases being compared would need to have the same collation, which is @@ -86,6 +91,13 @@ MySQL: 2) Has poor boolean support. +Oracle: + +1) No Boolean support. + +2) Better analytic function support, e.g. "IGNORE NULLS" is supported. + +3) Strange oddities abound, like no LIMIT clause. Improvements: http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/05b19bb8/tests/comparison/util/verify-oracle-connection.py ---------------------------------------------------------------------- diff --git a/tests/comparison/util/verify-oracle-connection.py b/tests/comparison/util/verify-oracle-connection.py new file mode 100755 index 0000000..3db4ed1 --- /dev/null +++ b/tests/comparison/util/verify-oracle-connection.py @@ -0,0 +1,63 @@ +#!/usr/bin/env impala-python + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# This is a script that allows a tester to test his Oracle setup and cx_Oracle +# installation within his impala-python environment. It's meant to be super simple. The +# emphasis here is on a cx_Oracle connection and cursor with no other distractions, even +# command line option parsing/handling. Modify the constants below and run: +# +# $ ./verify-oracle-connection.py +# +# If you get an exception, something is wrong. If cx_Oracle was able to make a +# connection to Oracle and run a simple query, you'll get a success message. +# +# Refer to ORACLE.txt for help. + +# Importing the whole module instead of doing selective import seems to help find linker +# errors. +import cx_Oracle + +# Host on which Oracle Database lies. +HOST = '127.0.0.1' + +# The values below are default for the version of Oracle Database Express Edition +# tested. You may need to change these as needed: +PORT = 1521 +SID = 'XE' +USER = 'system' +PASSWORD = 'oracle' + + +def main(): + TEST_QUERY = 'SELECT 1 FROM DUAL' + EXPECTED_RESULT = [(1,)] + dsn = cx_Oracle.makedsn(HOST, PORT, SID) + with cx_Oracle.connect(user=USER, password=PASSWORD, dsn=dsn) as conn: + try: + cursor = conn.cursor() + query = cursor.execute(TEST_QUERY) + rows = query.fetchall() + finally: + cursor.close() + assert rows == EXPECTED_RESULT + print 'success' + + +if '__main__' == __name__: + main()
