Hi Gokul,
On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]> wrote: > Hi all, > > In DAS 3.0.x, for interacting with relational databases directly from > Spark (i.e. bypassing the data access layer), we have hitherto been using > the JDBC connector that comes directly with Apache Spark (with added > support for Carbon datasources). > > This connector has contained many issues that have been detrimental to > proper user experience, including: > > - Having to create tables on the RDBMS beforehand, prior to query execution > - Tables getting dropped and re-created with a Spark-dictated schema > during initialization > - No support for RDBMS unique keys > - Not being able to perform INSERT INTO queries on RDBMS tables which have > unique keys set, and as a result the user having to depend upon INSERT > OVERWRITE which clears the table. This would result in the loss of > historical data > > I have been working on overhauling this connector over the past couple of > weeks to address the above flaws and bring it up to scratch. A new a config > file which contains the relevant information in a particular RDBMS flavour > (such as parameterised query formats, datatypes etc) has also been > introduced. An overview of all improvements is as follows; > > - RDBMS tables will be created dynamically (based on the schema provided > by the user) if they don't exist already > What is the data type to be used with fields in the schema? Is it SQL types or data bridge data types? Could you please provide a sample create table query. > - Pre-existing tables will be appropriated for use without > dropping/recreating > - Recognition of primary keys and switching between INSERT/UPSERT modes > automatically during Spark's INSERT INTO calls > - Support for creating DB indices, based on an additional input parameter > - Spark INSERT OVERWRITE calls can be used to clear the existing table > without existing schema/index definitions being affected. > > This initial implementation can be found at [1]. It's written mostly in > Scala. > > Initially, we've tested the connector against MySQL as part of the first > cut, and we will be testing against all DBs supported by DAS over the > following days. The connector is expected to be shipped with the DAS 3.1.0 > release. > > Thoughts welcome. > > [1] https://github.com/wso2/carbon-analytics/pull/187 > > Thanks, > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Thanks & Regards, Inosh Goonewardena Associate Technical Lead- WSO2 Inc. Mobile: +94779966317
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
