Hi Gokul, Will this allow us to perform INSERT INTO queries with sample data (not from a table)? This is useful in the DEV phase.
Cheers~ On Tue, Jun 14, 2016 at 7:38 AM, Gokul Balakrishnan <[email protected]> wrote: > Hi product analytics leads, > > Please make sure that the configuration file spark-jdbc-config.xml is > added to the product-analytics packs, especially is you're using the > CarbonJDBC provider. Example commit may be found at [1]. > > [1] > https://github.com/wso2/product-das/commit/4bdbf68833bd2bc8a20549eaf726873cacde468f > > Thanks, > > On 13 June 2016 at 17:37, Gokul Balakrishnan <[email protected]> wrote: > >> Hi Anjana, Nirmal, >> >> The schema being mandatory is an architectural decision we've had to >> take. If I go into a bit more detail as to the reasons, Spark requires its >> own catalyst schema to be constructed when a relation is being created. In >> the previous implementation, this was achieved through dropping the target >> RDBMS table and recreating it in a format Spark understands. However, in >> the current implementation, we have removed the need for and DML operation >> during table creation, unless specifically requested. >> >> The issue with making this parameter optional is that we will have to >> again fall back to the earlier behaviour of the schema being inferred from >> the table metadata, if not specified. This will mean having to maintain a >> list of reverse mappings which will pollute the implementation. Moreover, >> we will have inconsistencies when certain table schemata are inferred while >> others are specified. >> >> Please note that this is not an API change nor is it a change in >> deployable artefacts: the user merely has to do edit his/her DAS extensions >> (i.e. Spark scripts) if applicable. We will clearly point out the changes >> that need be done, in the DAS 3.1.0 migration guide. >> >> Thanks, >> >> On 13 June 2016 at 16:44, Anjana Fernando <[email protected]> wrote: >> >>> Hi, >>> >>> On Mon, Jun 13, 2016 at 12:23 PM, Nirmal Fernando <[email protected]> >>> wrote: >>> >>>> >>>> The "schema" option is required, and is used to specify the schema to >>>>> be utilised throughout the temporary table's lifetime. Here, the field >>>>> types used for the schema match what we have for the CarbonAnalytics >>>>> provider (i.e. not JDBC nor databridge), and correspond to Spark catalyst >>>>> types. Moreover, the optional "-i" keyword for a field if specified will >>>>> create an RDBMS index for that field. >>>>> >>>> >>>> Can't we make 'schema' optional as it was earlier? This introduces a >>>> backward incompatible change otherwise. >>>> >>> >>> The schema was optional before, because earlier it mandated the user to >>> create the table beforehand, which was not desirable, where for subsequent >>> "insert override" statements, they drop the table and tried to re-create >>> it, and didn't do a good job in doing so. So this approach was done to make >>> it more consistent in the way we create the tables. And in the new >>> implementation, we need the schema to be known beforehand to know about its >>> primary keys etc.. to do the operations properly. But yeah, for the sake of >>> backward compatibility, we can do somewhat of a best effort implementation >>> by, looking up the table schema using JDBC and trying to figure out the >>> table schema, mainly the primary keys, which is the critical information we >>> need. But this is not always a definite thing we can expect from JDBC, >>> where some DBMSs may not expose this properly through that. So anyway, it >>> is highly recommended to move into the new approach when you're using >>> CarbonJDBC. But we will try to do a best effort implementation to retain >>> backward compatibility, @Gokul, please check on this. >>> >>> Cheers, >>> Anjana. >>> >>> >>>> >>>>> The "primaryKeys" option is not mandatory, and may be used to denote >>>>> unique key fields in the underlying RDBMS table. It is based on this >>>>> option >>>>> that INSERT or UPSERT queries will be chosen when doing Spark INSERT INTO >>>>> queries, as explained above. >>>>> >>>>> We're in the process of documenting the usage patterns of this >>>>> provider so that they can be better understood. >>>>> >>>>> Thanks, >>>>> >>>>> On 10 June 2016 at 15:16, Inosh Goonewardena <[email protected]> wrote: >>>>> >>>>>> Hi Gokul, >>>>>> >>>>>> >>>>>> On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> In DAS 3.0.x, for interacting with relational databases directly >>>>>>> from Spark (i.e. bypassing the data access layer), we have hitherto been >>>>>>> using the JDBC connector that comes directly with Apache Spark (with >>>>>>> added >>>>>>> support for Carbon datasources). >>>>>>> >>>>>>> This connector has contained many issues that have been detrimental >>>>>>> to proper user experience, including: >>>>>>> >>>>>>> - Having to create tables on the RDBMS beforehand, prior to query >>>>>>> execution >>>>>>> - Tables getting dropped and re-created with a Spark-dictated schema >>>>>>> during initialization >>>>>>> - No support for RDBMS unique keys >>>>>>> - Not being able to perform INSERT INTO queries on RDBMS tables >>>>>>> which have unique keys set, and as a result the user having to depend >>>>>>> upon >>>>>>> INSERT OVERWRITE which clears the table. This would result in the loss >>>>>>> of >>>>>>> historical data >>>>>>> >>>>>>> I have been working on overhauling this connector over the past >>>>>>> couple of weeks to address the above flaws and bring it up to scratch. A >>>>>>> new a config file which contains the relevant information in a >>>>>>> particular >>>>>>> RDBMS flavour (such as parameterised query formats, datatypes etc) has >>>>>>> also >>>>>>> been introduced. An overview of all improvements is as follows; >>>>>>> >>>>>>> - RDBMS tables will be created dynamically (based on the schema >>>>>>> provided by the user) if they don't exist already >>>>>>> >>>>>> >>>>>> What is the data type to be used with fields in the schema? Is it SQL >>>>>> types or data bridge data types? Could you please provide a sample create >>>>>> table query. >>>>>> >>>>>> >>>>>>> - Pre-existing tables will be appropriated for use without >>>>>>> dropping/recreating >>>>>>> - Recognition of primary keys and switching between INSERT/UPSERT >>>>>>> modes automatically during Spark's INSERT INTO calls >>>>>>> - Support for creating DB indices, based on an additional input >>>>>>> parameter >>>>>>> - Spark INSERT OVERWRITE calls can be used to clear the existing >>>>>>> table without existing schema/index definitions being affected. >>>>>>> >>>>>>> This initial implementation can be found at [1]. It's written mostly >>>>>>> in Scala. >>>>>>> >>>>>>> Initially, we've tested the connector against MySQL as part of the >>>>>>> first cut, and we will be testing against all DBs supported by DAS over >>>>>>> the >>>>>>> following days. The connector is expected to be shipped with the DAS >>>>>>> 3.1.0 >>>>>>> release. >>>>>>> >>>>>>> Thoughts welcome. >>>>>>> >>>>>>> [1] https://github.com/wso2/carbon-analytics/pull/187 >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -- >>>>>>> Gokul Balakrishnan >>>>>>> Senior Software Engineer, >>>>>>> WSO2, Inc. http://wso2.com >>>>>>> M +94 77 5935 789 | +44 7563 570502 >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> [email protected] >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> >>>>>> Inosh Goonewardena >>>>>> Associate Technical Lead- WSO2 Inc. >>>>>> Mobile: +94779966317 >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> [email protected] >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Gokul Balakrishnan >>>>> Senior Software Engineer, >>>>> WSO2, Inc. http://wso2.com >>>>> M +94 77 5935 789 | +44 7563 570502 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Thanks & regards, >>>> Nirmal >>>> >>>> Team Lead - WSO2 Machine Learner >>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>> Mobile: +94715779733 >>>> Blog: http://nirmalfdo.blogspot.com/ >>>> >>>> >>>> >>> >>> >>> -- >>> *Anjana Fernando* >>> Senior Technical Lead >>> WSO2 Inc. | http://wso2.com >>> lean . enterprise . middleware >>> >> >> >> >> -- >> Gokul Balakrishnan >> Senior Software Engineer, >> WSO2, Inc. http://wso2.com >> M +94 77 5935 789 | +44 7563 570502 >> >> > > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Dulitha Wijewantha (Chan) Software Engineer - Mobile Development WSO2 Inc Lean.Enterprise.Middleware * ~Email [email protected] <[email protected]>* * ~Mobile +94712112165* * ~Website dulitha.me <http://dulitha.me>* * ~Twitter @dulitharw <https://twitter.com/dulitharw>* *~Github @dulichan <https://github.com/dulichan>* *~SO @chan <http://stackoverflow.com/users/813471/chan>*
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
