Hi all,

In DAS 3.0.x, for interacting with relational databases directly from Spark
(i.e. bypassing the data access layer), we have hitherto been using the
JDBC connector that comes directly with Apache Spark (with added support
for Carbon datasources).

This connector has contained many issues that have been detrimental to
proper user experience, including:

- Having to create tables on the RDBMS beforehand, prior to query execution
- Tables getting dropped and re-created with a Spark-dictated schema during
initialization
- No support for RDBMS unique keys
- Not being able to perform INSERT INTO queries on RDBMS tables which have
unique keys set, and as a result the user having to depend upon INSERT
OVERWRITE which clears the table. This would result in the loss of
historical data

I have been working on overhauling this connector over the past couple of
weeks to address the above flaws and bring it up to scratch. A new a config
file which contains the relevant information in a particular RDBMS flavour
(such as parameterised query formats, datatypes etc) has also been
introduced. An overview of all improvements is as follows;

- RDBMS tables will be created dynamically (based on the schema provided by
the user) if they don't exist already
- Pre-existing tables will be appropriated for use without
dropping/recreating
- Recognition of primary keys and switching between INSERT/UPSERT modes
automatically during Spark's INSERT INTO calls
- Support for creating DB indices, based on an additional input parameter
- Spark INSERT OVERWRITE calls can be used to clear the existing table
without existing schema/index definitions being affected.

This initial implementation can be found at [1]. It's written mostly in
Scala.

Initially, we've tested the connector against MySQL as part of the first
cut, and we will be testing against all DBs supported by DAS over the
following days. The connector is expected to be shipped with the DAS 3.1.0
release.

Thoughts welcome.

[1] https://github.com/wso2/carbon-analytics/pull/187

Thanks,

-- 
Gokul Balakrishnan
Senior Software Engineer,
WSO2, Inc. http://wso2.com
M +94 77 5935 789 | +44 7563 570502
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to