Hi Niranda,

No. Not incremental data processing. My question regarding the deleting
entire summery table records and re-insert again. IMO, doing upsert will be
more efficient than your above approach. Again, if there is no other
option, above re-insert is done as a batch operation or are you insert
record one by one?

Regards,
Gihan

On Wed, Aug 12, 2015 at 11:40 AM, Niranda Perera <[email protected]> wrote:

> Hi Gihan,
>
> are we talking about incremental processing here? insert into/overwrite
> queries will normally be used to push analyzed data into summary tables.
>
> in the spark jargon, insert overwrite table means, completely deleting the
> table and recreating it. I'm a confused with the meaning of 'overwrite'
> with respect to the previous 2.5.0 Hive scripts, are doing an update there?
>
> rgds
>
> On Tue, Aug 11, 2015 at 7:58 PM, Gihan Anuruddha <[email protected]> wrote:
>
>> Hi Niranda,
>>
>> Are we going to solve those limitations before the GA? Specially
>> limitation no.2. Over time we can have stat table with thousands of
>> records, so are we going to remove all the records and reinsert every time
>> that spark script runs?
>>
>> Regards,
>> Gihan
>>
>> On Tue, Aug 11, 2015 at 7:13 AM, Niranda Perera <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> we have implemented a custom Spark JDBC connector to be used in the
>>> Carbon environment.
>>>
>>> this enables the following
>>>
>>>    1. Now, temporary tables can be created in the Spark environment by
>>>    specifying an analytics datasource (configured by the
>>>    analytics-datasources.xml) and a table
>>>    2. Spark uses "SELECT 1 FROM $table LIMIT 1" query to check the
>>>    existence of a table and the LIMIT query is not provided by all dbs. With
>>>    the new connector, this query can be provided with as a config. (this
>>>    config is still WIP)
>>>    3. Adding new spark dialects related for various dbs (WIP)
>>>
>>> the idea is to test this for the following dbs
>>>
>>>    - mysql
>>>    - h2
>>>    - mssql
>>>    - oracle
>>>    - postgres
>>>    - db2
>>>
>>> I have loosely tested the connector with MySQL, and I would like the
>>> APIM team to use it with the API usage stats use-case, and provide us some
>>> feedback.
>>>
>>> this connector can be accessed as follows. (docs are still not updated.
>>> I will do that ASAP)
>>>
>>> create temporary table <temp_table> using CarbonJDBC options (dataSource
>>> "<datasource name>", tableName "<table name>");
>>>
>>> select * from <temp_table>
>>>
>>> insert into/overwrite table <temp_table> <some select statement>
>>>
>>> known limitations
>>>
>>>    1.  when creating a temp table, it should already be created in the
>>>    underlying datasource
>>>    2. "insert overwrite table" deletes the existing table and creates
>>>    it again
>>>
>>>
>>> would be very grateful if you could use this connector in your current
>>> JDBC use cases and provide us with feedback.
>>>
>>> best
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>> https://pythagoreanscript.wordpress.com/
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> W.G. Gihan Anuruddha
>> Senior Software Engineer | WSO2, Inc.
>> M: +94772272595
>>
>> _______________________________________________
>> Dev mailing list
>> [email protected]
>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
> https://pythagoreanscript.wordpress.com/
>



-- 
W.G. Gihan Anuruddha
Senior Software Engineer | WSO2, Inc.
M: +94772272595
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to