Hi product analytics leads, Please make sure that the configuration file spark-jdbc-config.xml is added to the product-analytics packs, especially is you're using the CarbonJDBC provider. Example commit may be found at [1].
[1] https://github.com/wso2/product-das/commit/4bdbf68833bd2bc8a20549eaf726873cacde468f Thanks, On 13 June 2016 at 17:37, Gokul Balakrishnan <[email protected]> wrote: > Hi Anjana, Nirmal, > > The schema being mandatory is an architectural decision we've had to take. > If I go into a bit more detail as to the reasons, Spark requires its own > catalyst schema to be constructed when a relation is being created. In the > previous implementation, this was achieved through dropping the target > RDBMS table and recreating it in a format Spark understands. However, in > the current implementation, we have removed the need for and DML operation > during table creation, unless specifically requested. > > The issue with making this parameter optional is that we will have to > again fall back to the earlier behaviour of the schema being inferred from > the table metadata, if not specified. This will mean having to maintain a > list of reverse mappings which will pollute the implementation. Moreover, > we will have inconsistencies when certain table schemata are inferred while > others are specified. > > Please note that this is not an API change nor is it a change in > deployable artefacts: the user merely has to do edit his/her DAS extensions > (i.e. Spark scripts) if applicable. We will clearly point out the changes > that need be done, in the DAS 3.1.0 migration guide. > > Thanks, > > On 13 June 2016 at 16:44, Anjana Fernando <[email protected]> wrote: > >> Hi, >> >> On Mon, Jun 13, 2016 at 12:23 PM, Nirmal Fernando <[email protected]> >> wrote: >> >>> >>> The "schema" option is required, and is used to specify the schema to be >>>> utilised throughout the temporary table's lifetime. Here, the field types >>>> used for the schema match what we have for the CarbonAnalytics provider >>>> (i.e. not JDBC nor databridge), and correspond to Spark catalyst types. >>>> Moreover, the optional "-i" keyword for a field if specified will create an >>>> RDBMS index for that field. >>>> >>> >>> Can't we make 'schema' optional as it was earlier? This introduces a >>> backward incompatible change otherwise. >>> >> >> The schema was optional before, because earlier it mandated the user to >> create the table beforehand, which was not desirable, where for subsequent >> "insert override" statements, they drop the table and tried to re-create >> it, and didn't do a good job in doing so. So this approach was done to make >> it more consistent in the way we create the tables. And in the new >> implementation, we need the schema to be known beforehand to know about its >> primary keys etc.. to do the operations properly. But yeah, for the sake of >> backward compatibility, we can do somewhat of a best effort implementation >> by, looking up the table schema using JDBC and trying to figure out the >> table schema, mainly the primary keys, which is the critical information we >> need. But this is not always a definite thing we can expect from JDBC, >> where some DBMSs may not expose this properly through that. So anyway, it >> is highly recommended to move into the new approach when you're using >> CarbonJDBC. But we will try to do a best effort implementation to retain >> backward compatibility, @Gokul, please check on this. >> >> Cheers, >> Anjana. >> >> >>> >>>> The "primaryKeys" option is not mandatory, and may be used to denote >>>> unique key fields in the underlying RDBMS table. It is based on this option >>>> that INSERT or UPSERT queries will be chosen when doing Spark INSERT INTO >>>> queries, as explained above. >>>> >>>> We're in the process of documenting the usage patterns of this provider >>>> so that they can be better understood. >>>> >>>> Thanks, >>>> >>>> On 10 June 2016 at 15:16, Inosh Goonewardena <[email protected]> wrote: >>>> >>>>> Hi Gokul, >>>>> >>>>> >>>>> On Fri, Jun 10, 2016 at 2:08 PM, Gokul Balakrishnan <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> In DAS 3.0.x, for interacting with relational databases directly from >>>>>> Spark (i.e. bypassing the data access layer), we have hitherto been using >>>>>> the JDBC connector that comes directly with Apache Spark (with added >>>>>> support for Carbon datasources). >>>>>> >>>>>> This connector has contained many issues that have been detrimental >>>>>> to proper user experience, including: >>>>>> >>>>>> - Having to create tables on the RDBMS beforehand, prior to query >>>>>> execution >>>>>> - Tables getting dropped and re-created with a Spark-dictated schema >>>>>> during initialization >>>>>> - No support for RDBMS unique keys >>>>>> - Not being able to perform INSERT INTO queries on RDBMS tables which >>>>>> have unique keys set, and as a result the user having to depend upon >>>>>> INSERT >>>>>> OVERWRITE which clears the table. This would result in the loss of >>>>>> historical data >>>>>> >>>>>> I have been working on overhauling this connector over the past >>>>>> couple of weeks to address the above flaws and bring it up to scratch. A >>>>>> new a config file which contains the relevant information in a particular >>>>>> RDBMS flavour (such as parameterised query formats, datatypes etc) has >>>>>> also >>>>>> been introduced. An overview of all improvements is as follows; >>>>>> >>>>>> - RDBMS tables will be created dynamically (based on the schema >>>>>> provided by the user) if they don't exist already >>>>>> >>>>> >>>>> What is the data type to be used with fields in the schema? Is it SQL >>>>> types or data bridge data types? Could you please provide a sample create >>>>> table query. >>>>> >>>>> >>>>>> - Pre-existing tables will be appropriated for use without >>>>>> dropping/recreating >>>>>> - Recognition of primary keys and switching between INSERT/UPSERT >>>>>> modes automatically during Spark's INSERT INTO calls >>>>>> - Support for creating DB indices, based on an additional input >>>>>> parameter >>>>>> - Spark INSERT OVERWRITE calls can be used to clear the existing >>>>>> table without existing schema/index definitions being affected. >>>>>> >>>>>> This initial implementation can be found at [1]. It's written mostly >>>>>> in Scala. >>>>>> >>>>>> Initially, we've tested the connector against MySQL as part of the >>>>>> first cut, and we will be testing against all DBs supported by DAS over >>>>>> the >>>>>> following days. The connector is expected to be shipped with the DAS >>>>>> 3.1.0 >>>>>> release. >>>>>> >>>>>> Thoughts welcome. >>>>>> >>>>>> [1] https://github.com/wso2/carbon-analytics/pull/187 >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> Gokul Balakrishnan >>>>>> Senior Software Engineer, >>>>>> WSO2, Inc. http://wso2.com >>>>>> M +94 77 5935 789 | +44 7563 570502 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> [email protected] >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> >>>>> Inosh Goonewardena >>>>> Associate Technical Lead- WSO2 Inc. >>>>> Mobile: +94779966317 >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Gokul Balakrishnan >>>> Senior Software Engineer, >>>> WSO2, Inc. http://wso2.com >>>> M +94 77 5935 789 | +44 7563 570502 >>>> >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Team Lead - WSO2 Machine Learner >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> >> >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > -- Gokul Balakrishnan Senior Software Engineer, WSO2, Inc. http://wso2.com M +94 77 5935 789 | +44 7563 570502
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
