Hi Gihan/Inosh,

A sample statement for creating a temporary table using this provider would
look like the following:

CREATE TEMPORARY TABLE StateUsage using CarbonJDBC options (dataSource
"MY_DATASOURCE", tableName "state_usage",schema "us_state STRING -i,
polarity INTEGER, usage_avg FLOAT", primaryKeys "us_state");

The dataSource and tableName parameters are unchanged, while 2 new options
have been added.

The "schema" option is required, and is used to specify the schema to be
utilised throughout the temporary table's lifetime. Here, the field types
used for the schema match what we have for the CarbonAnalytics provider
(i.e. not JDBC nor databridge), and correspond to Spark catalyst types.
Moreover, the optional "-i" keyword for a field if specified will create an
RDBMS index for that field.

The "primaryKeys" option is not mandatory, and may be used to denote unique
key fields in the underlying RDBMS table. It is based on this option that
INSERT or UPSERT queries will be chosen when doing Spark INSERT INTO
queries, as explained above.

We're in the process of documenting the usage patterns of this provider so
that they can be better understood.

Thanks,

On 10 June 2016 at 15:16, Inosh Goonewardena <[email protected]> wrote:

> Hi Gokul,
>
>
> On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]>
> wrote:
>
>> Hi all,
>>
>> In DAS 3.0.x, for interacting with relational databases directly from
>> Spark (i.e. bypassing the data access layer), we have hitherto been using
>> the JDBC connector that comes directly with Apache Spark (with added
>> support for Carbon datasources).
>>
>> This connector has contained many issues that have been detrimental to
>> proper user experience, including:
>>
>> - Having to create tables on the RDBMS beforehand, prior to query
>> execution
>> - Tables getting dropped and re-created with a Spark-dictated schema
>> during initialization
>> - No support for RDBMS unique keys
>> - Not being able to perform INSERT INTO queries on RDBMS tables which
>> have unique keys set, and as a result the user having to depend upon INSERT
>> OVERWRITE which clears the table. This would result in the loss of
>> historical data
>>
>> I have been working on overhauling this connector over the past couple of
>> weeks to address the above flaws and bring it up to scratch. A new a config
>> file which contains the relevant information in a particular RDBMS flavour
>> (such as parameterised query formats, datatypes etc) has also been
>> introduced. An overview of all improvements is as follows;
>>
>> - RDBMS tables will be created dynamically (based on the schema provided
>> by the user) if they don't exist already
>>
>
> What is the data type to be used with fields in the schema? Is it SQL
> types or data bridge data types? Could you please provide a sample create
> table query.
>
>
>> - Pre-existing tables will be appropriated for use without
>> dropping/recreating
>> - Recognition of primary keys and switching between INSERT/UPSERT modes
>> automatically during Spark's INSERT INTO calls
>> - Support for creating DB indices, based on an additional input parameter
>> - Spark INSERT OVERWRITE calls can be used to clear the existing table
>> without existing schema/index definitions being affected.
>>
>> This initial implementation can be found at [1]. It's written mostly in
>> Scala.
>>
>> Initially, we've tested the connector against MySQL as part of the first
>> cut, and we will be testing against all DBs supported by DAS over the
>> following days. The connector is expected to be shipped with the DAS 3.1.0
>> release.
>>
>> Thoughts welcome.
>>
>> [1] https://github.com/wso2/carbon-analytics/pull/187
>>
>> Thanks,
>>
>> --
>> Gokul Balakrishnan
>> Senior Software Engineer,
>> WSO2, Inc. http://wso2.com
>> M +94 77 5935 789 | +44 7563 570502
>>
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Thanks & Regards,
>
> Inosh Goonewardena
> Associate Technical Lead- WSO2 Inc.
> Mobile: +94779966317
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Gokul Balakrishnan
Senior Software Engineer,
WSO2, Inc. http://wso2.com
M +94 77 5935 789 | +44 7563 570502
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to