Hi Maninda, We have introduced some of the incremental data processing capabilities with upcoming 3.1.0 release. Please note that this doesn't support fully functional data processing with data aggregation functionalities. Basically what we have done is, introduced a way to fetch data based on time windows to avoid iterate same data set from the beginning again and again. To avoid the data losses, we have introduced some buffer time period and due to that some of the events may return for select queries more than once in a consecutive analytics task executions. Because of that, some aggregation operations like average can be wrong. We have a plan to introduce fully functional incremental data processing support in a future DAS release.
Regards, Gihan On Wed, Jun 8, 2016 at 11:53 AM, Maninda Edirisooriya <[email protected]> wrote: > [Adding Architecture list] > > Hi all, > > Timestamp based approach for incremental processing is problematic as we > have gone through long discussions on it and could not come to an > acceptable solution. Instead I think following kind of approach would work. > > 1. For each incremental analytic script a metadata column is added to the > analytics table with type boolean with name "processed" with value "false". > 2. When an incremental script is executed on a data row, that particular > row should get updated with *processed=true.* > 3. Next time when the script get executed it can skip all the rows with > field > > *processed=true.* > This will avoid the timestamp restriction and buffer time issues and allow > parallel execution on records. > Thanks. > > > *Maninda Edirisooriya* > Senior Software Engineer > > *WSO2, Inc.*lean.enterprise.middleware. > > *Blog* : http://maninda.blogspot.com/ > *E-mail* : [email protected] > *Skype* : @manindae > *Twitter* : @maninda > > On Wed, Jun 8, 2016 at 11:22 AM, Gihan Anuruddha <[email protected]> wrote: > >> Hi Guys, >> >> To fulfill above requirement, we can add query as below and make >> necessary changes to back-end. >> >> *create temporary table t5 using CarbonAnalytics options (tableName "t3", >> schema "x INT, y INT", incrementalParams "t5, -1");* >> >> Basically, we are passing -1 for buffer time. In the backend, if the >> buffer is -1 we only take last processed event timestamp and fetch the data. >> >> If we insert 3 records and do the commit when the buffer is -1 and then >> next time do the select without inserting any records, we are not getting >> any result since after the saved timestamp there was no new record inserted. >> >> So what do you think about this implementation? >> >> Regards, >> Gihan >> >> >> -- >> W.G. Gihan Anuruddha >> Senior Software Engineer | WSO2, Inc. >> M: +94772272595 >> > > -- W.G. Gihan Anuruddha Senior Software Engineer | WSO2, Inc. M: +94772272595
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
