Hi Gokul, On Fri, Jun 10, 2016 at 4:11 PM, Gokul Balakrishnan <[email protected]> wrote:
> Hi Gihan/Inosh, > > A sample statement for creating a temporary table using this provider > would look like the following: > > CREATE TEMPORARY TABLE StateUsage using CarbonJDBC options (dataSource > "MY_DATASOURCE", tableName "state_usage",schema "us_state STRING -i, > polarity INTEGER, usage_avg FLOAT", primaryKeys "us_state"); > > The dataSource and tableName parameters are unchanged, while 2 new options > have been added. > > The "schema" option is required, and is used to specify the schema to be > utilised throughout the temporary table's lifetime. Here, the field types > used for the schema match what we have for the CarbonAnalytics provider > (i.e. not JDBC nor databridge), and correspond to Spark catalyst types. > Moreover, the optional "-i" keyword for a field if specified will create an > RDBMS index for that field. > Can't we make 'schema' optional as it was earlier? This introduces a backward incompatible change otherwise. > > The "primaryKeys" option is not mandatory, and may be used to denote > unique key fields in the underlying RDBMS table. It is based on this option > that INSERT or UPSERT queries will be chosen when doing Spark INSERT INTO > queries, as explained above. > > We're in the process of documenting the usage patterns of this provider so > that they can be better understood. > > Thanks, > > On 10 June 2016 at 15:16, Inosh Goonewardena <[email protected]> wrote: > >> Hi Gokul, >> >> >> On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]> >> wrote: >> >>> Hi all, >>> >>> In DAS 3.0.x, for interacting with relational databases directly from >>> Spark (i.e. bypassing the data access layer), we have hitherto been using >>> the JDBC connector that comes directly with Apache Spark (with added >>> support for Carbon datasources). >>> >>> This connector has contained many issues that have been detrimental to >>> proper user experience, including: >>> >>> - Having to create tables on the RDBMS beforehand, prior to query >>> execution >>> - Tables getting dropped and re-created with a Spark-dictated schema >>> during initialization >>> - No support for RDBMS unique keys >>> - Not being able to perform INSERT INTO queries on RDBMS tables which >>> have unique keys set, and as a result the user having to depend upon INSERT >>> OVERWRITE which clears the table. This would result in the loss of >>> historical data >>> >>> I have been working on overhauling this connector over the past couple >>> of weeks to address the above flaws and bring it up to scratch. A new a >>> config file which contains the relevant information in a particular RDBMS >>> flavour (such as parameterised query formats, datatypes etc) has also been >>> introduced. An overview of all improvements is as follows; >>> >>> - RDBMS tables will be created dynamically (based on the schema provided >>> by the user) if they don't exist already >>> >> >> What is the data type to be used with fields in the schema? Is it SQL >> types or data bridge data types? Could you please provide a sample create >> table query. >> >> >>> - Pre-existing tables will be appropriated for use without >>> dropping/recreating >>> - Recognition of primary keys and switching between INSERT/UPSERT modes >>> automatically during Spark's INSERT INTO calls >>> - Support for creating DB indices, based on an additional input parameter >>> - Spark INSERT OVERWRITE calls can be used to clear the existing table >>> without existing schema/index definitions being affected. >>> >>> This initial implementation can be found at [1]. It's written mostly in >>> Scala. >>> >>> Initially, we've tested the connector against MySQL as part of the first >>> cut, and we will be testing against all DBs supported by DAS over the >>> following days. The connector is expected to be shipped with the DAS 3.1.0 >>> release. >>> >>> Thoughts welcome. >>> >>> [1] https://github.com/wso2/carbon-analytics/pull/187 >>> >>> Thanks, >>> >>> -- >>> Gokul Balakrishnan >>> Senior Software Engineer, >>> WSO2, Inc. http://wso2.com >>> M +94 77 5935 789 | +44 7563 570502 >>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Thanks & Regards, >> >> Inosh Goonewardena >> Associate Technical Lead- WSO2 Inc. >> Mobile: +94779966317 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
