Jon Roberts created HAWQ-1445:
---------------------------------
Summary: PXF JDBC Security, Extra Props, and Max Queries
Key: HAWQ-1445
URL: https://issues.apache.org/jira/browse/HAWQ-1445
Project: Apache HAWQ
Issue Type: Improvement
Components: PXF
Reporter: Jon Roberts
Assignee: Ed Espino
1) Security
The example has the username and password in the connection string.
LOCATION ('pxf://localhost:51200/demodb.myclass'
'?PROFILE=JDBC'
'&JDBC_DRIVER=com.mysql.jdbc.Driver'
'&DB_URL=jdbc:mysql://192.168.200.6:3306/demodb&USER=root&PASS=root'
)
This creates security issue because anyone that can connect to the database
will be able to see the username and password of the JDBC connection.
I suggest changing the URL to a connection profile that points to a file
outside of the database. For Greenplum database and S3, the LOCATION syntax
includes "config=/path/to/config_file". The config_file contains the S3
credentials. This seems like a good pattern to follow here too.
2) Extra Properties
Some JDBC drivers will need many additional properties beyond the URL and this
requires setting it with a put to a Properties variable. An example of this is
Oracle's defaultRowPrefetch property that needs to be updated from the default
of 10 which is designed for OLTP to something larger like 2000 which is more
ideal for data extracts.
Additionally, you will need the ability to set the isolation level which is
done with setTransactionIsolation on the Connection. I don't believe you can
set this on the connection URL either. Many SQL Server and DB2 database still
don't use snapshot isolation and use dirty reads instead to prevent blocking
locks. The configuration file I suggested above will need an "extra
properties" variable that is a delimited list of key/value pairs so you can add
multiple extra properties.
3) Max Queries
The external table definition doesn't limit how many concurrent queries can be
executed on the remote server. It would be pretty simple to create a single
external table using PXF JDBC that would issue thousands of concurrent queries
to a single source database when doing a single SELECT in HAWQ.
Initially, we should add a max_queries variable to the configuration file that
I'm suggesting, that will reject queries from proceeding when a greater number
of PXF instances are being requested than the max_queries variable. Longer
term, we should implement a queueing system so we can support external tables
that partitions data from the source at a very small grain but without killing
the source database.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)