Re: [Architecture] [DAS] Overhauling the Spark JDBC Connector

Inosh Goonewardena Fri, 10 Jun 2016 03:17:06 -0700

Hi Gokul,


On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]> wrote:

> Hi all,
>
> In DAS 3.0.x, for interacting with relational databases directly from
> Spark (i.e. bypassing the data access layer), we have hitherto been using
> the JDBC connector that comes directly with Apache Spark (with added
> support for Carbon datasources).
>
> This connector has contained many issues that have been detrimental to
> proper user experience, including:
>
> - Having to create tables on the RDBMS beforehand, prior to query execution
> - Tables getting dropped and re-created with a Spark-dictated schema
> during initialization
> - No support for RDBMS unique keys
> - Not being able to perform INSERT INTO queries on RDBMS tables which have
> unique keys set, and as a result the user having to depend upon INSERT
> OVERWRITE which clears the table. This would result in the loss of
> historical data
>
> I have been working on overhauling this connector over the past couple of
> weeks to address the above flaws and bring it up to scratch. A new a config
> file which contains the relevant information in a particular RDBMS flavour
> (such as parameterised query formats, datatypes etc) has also been
> introduced. An overview of all improvements is as follows;
>
> - RDBMS tables will be created dynamically (based on the schema provided
> by the user) if they don't exist already
>

What is the data type to be used with fields in the schema? Is it SQL types
or data bridge data types? Could you please provide a sample create table
query.


> - Pre-existing tables will be appropriated for use without
> dropping/recreating
> - Recognition of primary keys and switching between INSERT/UPSERT modes
> automatically during Spark's INSERT INTO calls
> - Support for creating DB indices, based on an additional input parameter
> - Spark INSERT OVERWRITE calls can be used to clear the existing table
> without existing schema/index definitions being affected.
>
> This initial implementation can be found at [1]. It's written mostly in
> Scala.
>
> Initially, we've tested the connector against MySQL as part of the first
> cut, and we will be testing against all DBs supported by DAS over the
> following days. The connector is expected to be shipped with the DAS 3.1.0
> release.
>
> Thoughts welcome.
>
> [1] https://github.com/wso2/carbon-analytics/pull/187
>
> Thanks,
>
> --
> Gokul Balakrishnan
> Senior Software Engineer,
> WSO2, Inc. http://wso2.com
> M +94 77 5935 789 | +44 7563 570502
>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Thanks & Regards,

Inosh Goonewardena
Associate Technical Lead- WSO2 Inc.
Mobile: +94779966317

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [DAS] Overhauling the Spark JDBC Connector

Reply via email to