Re: [Architecture] [DAS] Overhauling the Spark JDBC Connector

Nirmal Fernando Sun, 12 Jun 2016 23:54:39 -0700

Hi Gokul,

On Fri, Jun 10, 2016 at 4:11 PM, Gokul Balakrishnan <[email protected]> wrote:


> Hi Gihan/Inosh,
>
> A sample statement for creating a temporary table using this provider
> would look like the following:
>
> CREATE TEMPORARY TABLE StateUsage using CarbonJDBC options (dataSource
> "MY_DATASOURCE", tableName "state_usage",schema "us_state STRING -i,
> polarity INTEGER, usage_avg FLOAT", primaryKeys "us_state");
>
> The dataSource and tableName parameters are unchanged, while 2 new options
> have been added.
>
> The "schema" option is required, and is used to specify the schema to be
> utilised throughout the temporary table's lifetime. Here, the field types
> used for the schema match what we have for the CarbonAnalytics provider
> (i.e. not JDBC nor databridge), and correspond to Spark catalyst types.
> Moreover, the optional "-i" keyword for a field if specified will create an
> RDBMS index for that field.
>

Can't we make 'schema' optional as it was earlier? This introduces a
backward incompatible change otherwise.

>
> The "primaryKeys" option is not mandatory, and may be used to denote
> unique key fields in the underlying RDBMS table. It is based on this option
> that INSERT or UPSERT queries will be chosen when doing Spark INSERT INTO
> queries, as explained above.
>
> We're in the process of documenting the usage patterns of this provider so
> that they can be better understood.
>
> Thanks,
>
> On 10 June 2016 at 15:16, Inosh Goonewardena <[email protected]> wrote:
>
>> Hi Gokul,
>>
>>
>> On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> In DAS 3.0.x, for interacting with relational databases directly from
>>> Spark (i.e. bypassing the data access layer), we have hitherto been using
>>> the JDBC connector that comes directly with Apache Spark (with added
>>> support for Carbon datasources).
>>>
>>> This connector has contained many issues that have been detrimental to
>>> proper user experience, including:
>>>
>>> - Having to create tables on the RDBMS beforehand, prior to query
>>> execution
>>> - Tables getting dropped and re-created with a Spark-dictated schema
>>> during initialization
>>> - No support for RDBMS unique keys
>>> - Not being able to perform INSERT INTO queries on RDBMS tables which
>>> have unique keys set, and as a result the user having to depend upon INSERT
>>> OVERWRITE which clears the table. This would result in the loss of
>>> historical data
>>>
>>> I have been working on overhauling this connector over the past couple
>>> of weeks to address the above flaws and bring it up to scratch. A new a
>>> config file which contains the relevant information in a particular RDBMS
>>> flavour (such as parameterised query formats, datatypes etc) has also been
>>> introduced. An overview of all improvements is as follows;
>>>
>>> - RDBMS tables will be created dynamically (based on the schema provided
>>> by the user) if they don't exist already
>>>
>>
>> What is the data type to be used with fields in the schema? Is it SQL
>> types or data bridge data types? Could you please provide a sample create
>> table query.
>>
>>
>>> - Pre-existing tables will be appropriated for use without
>>> dropping/recreating
>>> - Recognition of primary keys and switching between INSERT/UPSERT modes
>>> automatically during Spark's INSERT INTO calls
>>> - Support for creating DB indices, based on an additional input parameter
>>> - Spark INSERT OVERWRITE calls can be used to clear the existing table
>>> without existing schema/index definitions being affected.
>>>
>>> This initial implementation can be found at [1]. It's written mostly in
>>> Scala.
>>>
>>> Initially, we've tested the connector against MySQL as part of the first
>>> cut, and we will be testing against all DBs supported by DAS over the
>>> following days. The connector is expected to be shipped with the DAS 3.1.0
>>> release.
>>>
>>> Thoughts welcome.
>>>
>>> [1] https://github.com/wso2/carbon-analytics/pull/187
>>>
>>> Thanks,
>>>
>>> --
>>> Gokul Balakrishnan
>>> Senior Software Engineer,
>>> WSO2, Inc. http://wso2.com
>>> M +94 77 5935 789 | +44 7563 570502
>>>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Inosh Goonewardena
>> Associate Technical Lead- WSO2 Inc.
>> Mobile: +94779966317
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Gokul Balakrishnan
> Senior Software Engineer,
> WSO2, Inc. http://wso2.com
> M +94 77 5935 789 | +44 7563 570502
>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [DAS] Overhauling the Spark JDBC Connector

Reply via email to