[Adding Architecture list] Hi all,
Timestamp based approach for incremental processing is problematic as we have gone through long discussions on it and could not come to an acceptable solution. Instead I think following kind of approach would work. 1. For each incremental analytic script a metadata column is added to the analytics table with type boolean with name "processed" with value "false". 2. When an incremental script is executed on a data row, that particular row should get updated with *processed=true.* 3. Next time when the script get executed it can skip all the rows with field *processed=true.* This will avoid the timestamp restriction and buffer time issues and allow parallel execution on records. Thanks. *Maninda Edirisooriya* Senior Software Engineer *WSO2, Inc.*lean.enterprise.middleware. *Blog* : http://maninda.blogspot.com/ *E-mail* : [email protected] *Skype* : @manindae *Twitter* : @maninda On Wed, Jun 8, 2016 at 11:22 AM, Gihan Anuruddha <[email protected]> wrote: > Hi Guys, > > To fulfill above requirement, we can add query as below and make necessary > changes to back-end. > > *create temporary table t5 using CarbonAnalytics options (tableName "t3", > schema "x INT, y INT", incrementalParams "t5, -1");* > > Basically, we are passing -1 for buffer time. In the backend, if the > buffer is -1 we only take last processed event timestamp and fetch the data. > > If we insert 3 records and do the commit when the buffer is -1 and then > next time do the select without inserting any records, we are not getting > any result since after the saved timestamp there was no new record inserted. > > So what do you think about this implementation? > > Regards, > Gihan > > > -- > W.G. Gihan Anuruddha > Senior Software Engineer | WSO2, Inc. > M: +94772272595 >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
