Repository: incubator-impala
Updated Branches:
  refs/heads/master 34b5f1c41 -> 05b19bb84


IMPALA-4009: qgen: add docs, utility for installing Oracle as ref database

This patch adds documentation and a small utility for users wishing to
run the random query generator against Oracle as a reference database.
In particular, the patch

* points users toward installing Oracle Database.
* provides directions on incorporating a Python Oracle client
  (cx_Oracle) into our impala-python virtual environment.
* provides a small utility that will verify installation and
  connectivity between client and server

This documentation isn't meant to be a complete guide. Other patches
will add documentation for actually loading random query generator test
data into Oracle, for example.

Change-Id: Ib3bf84b7afa8851c49a8d0f0a1ceded94f4de158
Reviewed-on: http://gerrit.cloudera.org:8080/4095
Reviewed-by: Thomas Tauber-Marshall <[email protected]>
Reviewed-by: Matthew Jacobs <[email protected]>
Tested-by: Matthew Jacobs <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/05b19bb8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/05b19bb8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/05b19bb8

Branch: refs/heads/master
Commit: 05b19bb848d5c01f09715aa79c5858d7cca79b8a
Parents: 34b5f1c
Author: Michael Brown <[email protected]>
Authored: Tue Aug 23 09:06:16 2016 -0700
Committer: Matthew Jacobs <[email protected]>
Committed: Thu Aug 25 15:20:02 2016 +0000

----------------------------------------------------------------------
 tests/comparison/ORACLE.txt                     | 90 ++++++++++++++++++++
 tests/comparison/README                         | 14 ++-
 .../comparison/util/verify-oracle-connection.py | 63 ++++++++++++++
 3 files changed, 166 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/05b19bb8/tests/comparison/ORACLE.txt
----------------------------------------------------------------------
diff --git a/tests/comparison/ORACLE.txt b/tests/comparison/ORACLE.txt
new file mode 100644
index 0000000..bc91dbb
--- /dev/null
+++ b/tests/comparison/ORACLE.txt
@@ -0,0 +1,90 @@
+Contents
+
+I. Note
+II. Oracle Database server setup
+III. Oracle client setup
+IV. Oracle connectivity, cx_Oracle installation
+TODO: config/setup
+TODO: data load/migrate
+
+I. Note
+
+Due to licensing, we cannot distribute Oracle client binaries, and of
+course we will not distribute Oracle Database. Please refer to these
+guidelines if you need to use Oracle as a reference database for the
+Random Query Generator.
+
+
+II. Oracle Database server setup
+
+There are lots of choices for Oracle Database. What follows are two
+suggestions:
+
+http://www.oracle.com/technetwork/database/database-technologies/express-edition/downloads/index.html
+
+https://hub.docker.com/r/wnameless/oracle-xe-11g/
+
+Follow their instructions for installation and setup. If you can use
+sqlplus or some other shell to successfully "select 1 from dual;", then
+you're good to go here.
+
+
+III. Oracle client setup
+
+Use cx_Oracle.
+
+http://cx-oracle.sourceforge.net/
+
+Note: Installing this via pip or similar means won't work until you have
+first installed prerequisites.
+
+1. Install Oracle instant-client
+
+http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html
+
+After you choose your architecture, you need
+
+  1. the basic client
+  2. the SDK
+
+You will likely have to register on the Oracle site to download the
+instant client binaries. As of this writing, install instructions are at
+the bottom of the page once you make your architecture choice.
+
+2. Additional dependent packages in distribution
+
+It's possible either when trying to install cx_Oracle, import it, or use
+it, you will get some linker error. I cannot predict what libraries you
+do and don't have. Here's a list of Linux distributions and known
+package dependencies:
+
+Ubuntu 14.04.2 LTS
+libaio1
+
+3. Ensure impala-python is working. It may be enough to simply run:
+
+   $ impala-python
+
+   Getting it working is out of scope of this document.
+
+4. Run these commands to install cx_Oracle into the impala-python
+   virtual environment:
+
+   $ source "${IMPALA_HOME}"/infra/python/env/bin/activate
+   (env)$ pip install cx_Oracle==5.2.1
+   (env)$ deactivate
+
+If pip install fails, double check the following:
+  * ORACLE_HOME environment variable
+  * LD_LIBRARY_PATH environment variable
+
+BUILD.txt packaged with cx_Oracle has more instructions for installing
+cx_Oracle.
+
+
+IV. Oracle connectivity, cx_Oracle installation
+
+Verify cx_Oracle can connect to your Oracle Database. Edit appopriately
+and then run:
+
+   $ "${IMPALA_HOME}"/tests/comparison/util/verify-oracle-connection.py

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/05b19bb8/tests/comparison/README
----------------------------------------------------------------------
diff --git a/tests/comparison/README b/tests/comparison/README
index acb85b8..579a8cf 100644
--- a/tests/comparison/README
+++ b/tests/comparison/README
@@ -22,6 +22,9 @@ Requirements:
      sudo apt-get install python-mysqldb
      sudo apt-get install python-psycopg2   # Postgresql
 
+Please see the supplemental ORACLE.txt on setting up Oracle as a reference
+database.
+
 
 Usage:
 
@@ -74,7 +77,9 @@ Miscellaneous:
 
 Postgresql:
 
-1) Supports basically all Impala language features
+1) Supports basically all Impala language features. Exceptions include:
+
+   a) IGNORE NULLS clause for analytic functions
 
 2) Has strange sorting of strings, '-1' > '1'. This may be important if ORDER 
BY is ever
    used. The databases being compared would need to have the same collation, 
which is
@@ -86,6 +91,13 @@ MySQL:
 
 2) Has poor boolean support.
 
+Oracle:
+
+1) No Boolean support.
+
+2) Better analytic function support, e.g. "IGNORE NULLS" is supported.
+
+3) Strange oddities abound, like no LIMIT clause.
 
 Improvements:
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/05b19bb8/tests/comparison/util/verify-oracle-connection.py
----------------------------------------------------------------------
diff --git a/tests/comparison/util/verify-oracle-connection.py 
b/tests/comparison/util/verify-oracle-connection.py
new file mode 100755
index 0000000..3db4ed1
--- /dev/null
+++ b/tests/comparison/util/verify-oracle-connection.py
@@ -0,0 +1,63 @@
+#!/usr/bin/env impala-python
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This is a script that allows a tester to test his Oracle setup and cx_Oracle
+# installation within his impala-python environment. It's meant to be super 
simple.  The
+# emphasis here is on a cx_Oracle connection and cursor with no other 
distractions, even
+# command line option parsing/handling. Modify the constants below and run:
+#
+#  $ ./verify-oracle-connection.py
+#
+# If you get an exception, something is wrong. If cx_Oracle was able to make a
+# connection to Oracle and run a simple query, you'll get a success message.
+#
+# Refer to ORACLE.txt for help.
+
+# Importing the whole module instead of doing selective import seems to help 
find linker
+# errors.
+import cx_Oracle
+
+# Host on which Oracle Database lies.
+HOST = '127.0.0.1'
+
+# The values below are default for the version of Oracle Database Express 
Edition
+# tested.  You may need to change these as needed:
+PORT = 1521
+SID = 'XE'
+USER = 'system'
+PASSWORD = 'oracle'
+
+
+def main():
+  TEST_QUERY = 'SELECT 1 FROM DUAL'
+  EXPECTED_RESULT = [(1,)]
+  dsn = cx_Oracle.makedsn(HOST, PORT, SID)
+  with cx_Oracle.connect(user=USER, password=PASSWORD, dsn=dsn) as conn:
+    try:
+      cursor = conn.cursor()
+      query = cursor.execute(TEST_QUERY)
+      rows = query.fetchall()
+    finally:
+      cursor.close()
+    assert rows == EXPECTED_RESULT
+  print 'success'
+
+
+if '__main__' == __name__:
+  main()

Reply via email to