Jon Roberts created HAWQ-1445:
---------------------------------

             Summary: PXF JDBC Security, Extra Props, and Max Queries
                 Key: HAWQ-1445
                 URL: https://issues.apache.org/jira/browse/HAWQ-1445
             Project: Apache HAWQ
          Issue Type: Improvement
          Components: PXF
            Reporter: Jon Roberts
            Assignee: Ed Espino


1) Security
The example has the username and password in the connection string.  

LOCATION ('pxf://localhost:51200/demodb.myclass'
              '?PROFILE=JDBC'
              '&JDBC_DRIVER=com.mysql.jdbc.Driver'
              
'&DB_URL=jdbc:mysql://192.168.200.6:3306/demodb&USER=root&PASS=root'
              )

This creates  security issue because anyone that can connect to the database 
will be able to see the username and password of the JDBC connection.

I suggest changing the URL to a connection profile that points to a file 
outside of the database.  For Greenplum database and S3, the LOCATION syntax 
includes "config=/path/to/config_file".  The config_file contains the S3 
credentials.  This seems like a good pattern to follow here too.

2) Extra Properties
Some JDBC drivers will need many additional properties beyond the URL and this 
requires setting it with a put to a Properties variable.  An example of this is 
Oracle's defaultRowPrefetch property that needs to be updated from the default 
of 10 which is designed for OLTP to something larger like 2000 which is more 
ideal for data extracts.  

Additionally, you will need the ability to set the isolation level which is 
done with setTransactionIsolation on the Connection.  I don't believe you can 
set this on the connection URL either.  Many SQL Server and DB2 database still 
don't use snapshot isolation and use dirty reads instead to prevent blocking 
locks.  The configuration file I suggested above will need an "extra 
properties" variable that is a delimited list of key/value pairs so you can add 
multiple extra properties.

3) Max Queries
The external table definition doesn't limit how many concurrent queries can be 
executed on the remote server.  It would be pretty simple to create a single 
external table using PXF JDBC that would issue thousands of concurrent queries 
to a single source database when doing a single SELECT in HAWQ.

Initially, we should add a max_queries variable to the configuration file that 
I'm suggesting, that will reject queries from proceeding when a greater number 
of PXF instances are being requested than the max_queries variable.  Longer 
term, we should implement a queueing system so we can support external tables 
that partitions data from the source at a very small grain but without killing 
the source database.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to