Hi Niranda, No. Not incremental data processing. My question regarding the deleting entire summery table records and re-insert again. IMO, doing upsert will be more efficient than your above approach. Again, if there is no other option, above re-insert is done as a batch operation or are you insert record one by one?
Regards, Gihan On Wed, Aug 12, 2015 at 11:40 AM, Niranda Perera <[email protected]> wrote: > Hi Gihan, > > are we talking about incremental processing here? insert into/overwrite > queries will normally be used to push analyzed data into summary tables. > > in the spark jargon, insert overwrite table means, completely deleting the > table and recreating it. I'm a confused with the meaning of 'overwrite' > with respect to the previous 2.5.0 Hive scripts, are doing an update there? > > rgds > > On Tue, Aug 11, 2015 at 7:58 PM, Gihan Anuruddha <[email protected]> wrote: > >> Hi Niranda, >> >> Are we going to solve those limitations before the GA? Specially >> limitation no.2. Over time we can have stat table with thousands of >> records, so are we going to remove all the records and reinsert every time >> that spark script runs? >> >> Regards, >> Gihan >> >> On Tue, Aug 11, 2015 at 7:13 AM, Niranda Perera <[email protected]> wrote: >> >>> Hi all, >>> >>> we have implemented a custom Spark JDBC connector to be used in the >>> Carbon environment. >>> >>> this enables the following >>> >>> 1. Now, temporary tables can be created in the Spark environment by >>> specifying an analytics datasource (configured by the >>> analytics-datasources.xml) and a table >>> 2. Spark uses "SELECT 1 FROM $table LIMIT 1" query to check the >>> existence of a table and the LIMIT query is not provided by all dbs. With >>> the new connector, this query can be provided with as a config. (this >>> config is still WIP) >>> 3. Adding new spark dialects related for various dbs (WIP) >>> >>> the idea is to test this for the following dbs >>> >>> - mysql >>> - h2 >>> - mssql >>> - oracle >>> - postgres >>> - db2 >>> >>> I have loosely tested the connector with MySQL, and I would like the >>> APIM team to use it with the API usage stats use-case, and provide us some >>> feedback. >>> >>> this connector can be accessed as follows. (docs are still not updated. >>> I will do that ASAP) >>> >>> create temporary table <temp_table> using CarbonJDBC options (dataSource >>> "<datasource name>", tableName "<table name>"); >>> >>> select * from <temp_table> >>> >>> insert into/overwrite table <temp_table> <some select statement> >>> >>> known limitations >>> >>> 1. when creating a temp table, it should already be created in the >>> underlying datasource >>> 2. "insert overwrite table" deletes the existing table and creates >>> it again >>> >>> >>> would be very grateful if you could use this connector in your current >>> JDBC use cases and provide us with feedback. >>> >>> best >>> -- >>> *Niranda Perera* >>> Software Engineer, WSO2 Inc. >>> Mobile: +94-71-554-8430 >>> Twitter: @n1r44 <https://twitter.com/N1R44> >>> https://pythagoreanscript.wordpress.com/ >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> W.G. Gihan Anuruddha >> Senior Software Engineer | WSO2, Inc. >> M: +94772272595 >> >> _______________________________________________ >> Dev mailing list >> [email protected] >> http://wso2.org/cgi-bin/mailman/listinfo/dev >> >> > > > -- > *Niranda Perera* > Software Engineer, WSO2 Inc. > Mobile: +94-71-554-8430 > Twitter: @n1r44 <https://twitter.com/N1R44> > https://pythagoreanscript.wordpress.com/ > -- W.G. Gihan Anuruddha Senior Software Engineer | WSO2, Inc. M: +94772272595
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
