Hi all, In DAS 3.0.x, for interacting with relational databases directly from Spark (i.e. bypassing the data access layer), we have hitherto been using the JDBC connector that comes directly with Apache Spark (with added support for Carbon datasources).
This connector has contained many issues that have been detrimental to proper user experience, including: - Having to create tables on the RDBMS beforehand, prior to query execution - Tables getting dropped and re-created with a Spark-dictated schema during initialization - No support for RDBMS unique keys - Not being able to perform INSERT INTO queries on RDBMS tables which have unique keys set, and as a result the user having to depend upon INSERT OVERWRITE which clears the table. This would result in the loss of historical data I have been working on overhauling this connector over the past couple of weeks to address the above flaws and bring it up to scratch. A new a config file which contains the relevant information in a particular RDBMS flavour (such as parameterised query formats, datatypes etc) has also been introduced. An overview of all improvements is as follows; - RDBMS tables will be created dynamically (based on the schema provided by the user) if they don't exist already - Pre-existing tables will be appropriated for use without dropping/recreating - Recognition of primary keys and switching between INSERT/UPSERT modes automatically during Spark's INSERT INTO calls - Support for creating DB indices, based on an additional input parameter - Spark INSERT OVERWRITE calls can be used to clear the existing table without existing schema/index definitions being affected. This initial implementation can be found at [1]. It's written mostly in Scala. Initially, we've tested the connector against MySQL as part of the first cut, and we will be testing against all DBs supported by DAS over the following days. The connector is expected to be shipped with the DAS 3.1.0 release. Thoughts welcome. [1] https://github.com/wso2/carbon-analytics/pull/187 Thanks, -- Gokul Balakrishnan Senior Software Engineer, WSO2, Inc. http://wso2.com M +94 77 5935 789 | +44 7563 570502
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
