Hi Gokul, Can you please share a couple of sample Spark SQL queries that use this updated CarbonJDBC connector?
Regards, Gihan On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]> wrote: > Hi all, > > In DAS 3.0.x, for interacting with relational databases directly from > Spark (i.e. bypassing the data access layer), we have hitherto been using > the JDBC connector that comes directly with Apache Spark (with added > support for Carbon datasources). > > This connector has contained many issues that have been detrimental to > proper user experience, including: > > - Having to create tables on the RDBMS beforehand, prior to query execution > - Tables getting dropped and re-created with a Spark-dictated schema > during initialization > - No support for RDBMS unique keys > - Not being able to perform INSERT INTO queries on RDBMS tables which have > unique keys set, and as a result the user having to depend upon INSERT > OVERWRITE which clears the table. This would result in the loss of > historical data > > I have been working on overhauling this connector over the past couple of > weeks to address the above flaws and bring it up to scratch. A new a config > file which contains the relevant information in a particular RDBMS flavour > (such as parameterised query formats, datatypes etc) has also been > introduced. An overview of all improvements is as follows; > > - RDBMS tables will be created dynamically (based on the schema provided > by the user) if they don't exist already > - Pre-existing tables will be appropriated for use without > dropping/recreating > - Recognition of primary keys and switching between INSERT/UPSERT modes > automatically during Spark's INSERT INTO calls > - Support for creating DB indices, based on an additional input parameter > - Spark INSERT OVERWRITE calls can be used to clear the existing table > without existing schema/index definitions being affected. > > This initial implementation can be found at [1]. It's written mostly in > Scala. > > Initially, we've tested the connector against MySQL as part of the first > cut, and we will be testing against all DBs supported by DAS over the > following days. The connector is expected to be shipped with the DAS 3.1.0 > release. > > Thoughts welcome. > > [1] https://github.com/wso2/carbon-analytics/pull/187 > > Thanks, > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- W.G. Gihan Anuruddha Senior Software Engineer | WSO2, Inc. M: +94772272595
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
