On Thu, Jun 16, 2016 at 9:35 AM, Srinath Perera <[email protected]> wrote:
> I think this is OK for this release. > > However, we need to handle this cleanly and document clearly how it work > for next release. I think this is a more problem for CEP. ( adding Suho). > > > 1. Can we add a Jira so we will handle this in the future? > > Created 2 Jiras for batch and real time anaylitcs. https://wso2.org/jira/browse/CEP-1519 https://wso2.org/jira/browse/DAS-436 > > 1. > 2. Can we update the documents to describe current behaviour? > > Yes. We will explain current behavior cleanly. > --Srinath > > > > > On Wed, Jun 15, 2016 at 9:03 PM, Inosh Goonewardena <[email protected]> > wrote: > >> >> >> On Wed, Jun 15, 2016 at 4:20 PM, Inosh Goonewardena <[email protected]> >> wrote: >> >>> Hi Srinath >>> >>> >>>> DAS deployment might end up collecting data from multiple time zones? >>>> >>>> >>> >>>> Is that supported? >>>> How is it handled? >>>> Does end users have to issue queries in UTC? >>>> >>> >>> We receive epoch time in the timestamp fields of receiving events. Based >>> on that timestamp we extract time attributes such as year, month, day, >>> hour, etc. and do per <time unit> based aggregations (such as per >>> minute/hour/day counts). At the moment we extract those time attributes >>> based on the local time zone DAS server running. We have started a >>> discussion on this [1] to decide on the possibility of using UTC time, but >>> decided against it due to the discrepancies that can occur in window >>> boundaries. >>> >>> For example if the DAS server is running in IST time zone, some example >>> time window boundaries are as follows. >>> - day window [1461090600000 - 1461177000000] (2016/04/20 12:00:00 AM IST >>> - 2016/04/21 12:00:00 AM IST) >>> - hour window [1461108600000 - 1461112200000] (2016/04/20 05:00:00 AM >>> IST - 2016/04/21 06:00:00 AM IST) >>> >>> So with the above approach, time zone of the data publisher which we >>> collect data from is not considered and windowing is done based on local >>> time zones. >>> >>> Actually, if the data publisher publish data from a different time zone >>> and user wants to see window based aggregates (in charts) in based on that >>> time zone, there will still be an issue because DAS windows boundaries are >>> different than what user expects. >>> >> >> However, this can be avoided by doing windowing based on the time zone >> data is collected. Basically, what is explained above is the default >> behavior happens if inbuilt temporal functions are used (ex: time:extract() >> Siddhi extension [1] and DateTimeUDF [2] in Spark). But it is possible to >> extend those extensions and do windowing based on the time zone user >> interested in. However, in order support incremental data processing in >> such cases, we will need get timezone as an input parameter in incremental >> query and set the incremental processing window start time based on that. >> >> [1] >> https://github.com/wso2/siddhi/blob/master/modules/siddhi-extensions/time/src/main/java/org/wso2/siddhi/extension/time/ExtractAttributesFunctionExtension.java >> [2] >> https://github.com/wso2/shared-analytics/blob/master/components/spark-udf/org.wso2.carbon.analytics.shared.spark.common.udf/src/main/java/org/wso2/carbon/analytics/shared/common/udf/DateTimeUDF.java >> >> >>> >>> [1] [Analytics] Using UTC across all temporal analytics functions >>> >>> >>>> >>>> >>>> thoughts please >>>> >>>> --Srinath >>>> >>> >>> On Wed, Jun 15, 2016 at 2:11 AM, Mohanadarshan Vivekanandalingam < >>> [email protected]> wrote: >>> >>>> Hi Inosh, >>>> >>>> Thanks for the fix.. We have incorporated this to analytics-is.. >>>> >>>> Regards, >>>> Mohan >>>> >>>> >>>> On Tue, Jun 14, 2016 at 7:33 PM, Inosh Goonewardena <[email protected]> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> We made a modification to incremental data processing syntax and >>>>> implementation in order to avoid inconsistencies occur with timeWindows. >>>>> >>>>> For example, with the previous approach if the timeWindow is specified >>>>> as 3600 (or 86400) incremental read time is getting set to the hour >>>>> starting time (or day starting) time of UTC time. It is due to the the >>>>> fact >>>>> that incremental start time is calculated by performing modulo operation >>>>> on >>>>> a timestamp. In most of the scenarios, analytics is implemented in per >>>>> <time unit> basis, i.e., we maintain summary tables for per minute, per >>>>> hour, per day, per month. In these summary tables, we do windowing based >>>>> on >>>>> the local timezone of the server which DAS runs [1]. Since incremental >>>>> time >>>>> window start at UTC time (hour or day) inconsistencies happens in >>>>> aggregate >>>>> values of time windows. >>>>> >>>>> With the new introduced changes, incremental window start time is set >>>>> based on local time zone to avoid aforementioned inconsistencies. Syntax >>>>> is >>>>> as follows. >>>>> >>>>> CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options (tableName >>>>> "test", >>>>> incrementalProcessing "T1, *WindowUnit*, WindowOffset")*;* >>>>> >>>>> >>>>> WindowUnit concept is similar to the previous concept of TimeWindow, >>>>> but in here corresponding window time unit needs to be provided instead of >>>>> time in millis. The units supported are SECOND, MINUTE, HOUR, DAY, MONTH, >>>>> YEAR. >>>>> >>>>> For example, lets say server runs in IST time and last processed event >>>>> time is 1461111538669 (2016/04/20 05:48:58 PM IST). If the WindowUnit is >>>>> set as DAY and WindowOffset is set to 0 in the incremental table, next >>>>> script execution will read data starting from 1461090600000 (2016/04/20 >>>>> 12:00:00 AM IST). >>>>> >>>>> [1] [Analytics] Using UTC across all temporal analytics functions >>>>> >>>>> >>>>> >>>>> On Fri, Mar 25, 2016 at 10:59 AM, Gihan Anuruddha <[email protected]> >>>>> wrote: >>>>> >>>>>> Actually currently in ESB-analytics also we are using a similar >>>>>> mechanism. In there we keep 5 tables for each time unites(second, minute, >>>>>> hour, day, month) and we have windowOffset in correspondent to it time >>>>>> unite like in minute-60, hour- 3600 etc. We are only storing min, max, >>>>>> sum >>>>>> and count values only. So at a given time if we need the average, we >>>>>> divide >>>>>> the sum with count and get that. >>>>>> >>>>>> On Fri, Mar 25, 2016 at 8:47 AM, Inosh Goonewardena <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Mar 24, 2016 at 9:56 PM, Sachith Withana <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Inosh, >>>>>>>> >>>>>>>> We wouldn't have to do that IMO. >>>>>>>> >>>>>>>> We can persist the total aggregate value upto currentTimeWindow - >>>>>>>> WindowOffset, along with the previous time window aggregation meta >>>>>>>> data as >>>>>>>> well ( count, sum in the average aggregation case). >>>>>>>> >>>>>>> >>>>>>> Yep. That will do. >>>>>>> >>>>>>>> >>>>>>>> The previous total wouldn't be calculated again, it's the last two >>>>>>>> time windows ( including the current one) that we need to recalculate >>>>>>>> and >>>>>>>> add it to the previous total. >>>>>>>> >>>>>>>> It works almost the same way as the current incremental processing >>>>>>>> table, but keeping more meta_data on the aggregation related values. >>>>>>>> >>>>>>>> @Anjana WDYT? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sachith >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 25, 2016 at 7:07 AM, Inosh Goonewardena <[email protected] >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Hi Sachith, >>>>>>>>> >>>>>>>>> >>>>>>>>> *WindowOffset*: >>>>>>>>>> >>>>>>>>>> Events might arrive late that would belong to a previous >>>>>>>>>> processed time window. To account to that, we have added an optional >>>>>>>>>> parameter that would allow to >>>>>>>>>> process immediately previous time windows as well ( acts like a >>>>>>>>>> buffer). >>>>>>>>>> ex: If this is set to 1, apart from the to-be-processed data, >>>>>>>>>> data related to the previously processed time window will also be >>>>>>>>>> taken for >>>>>>>>>> processing. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This means, if window offset is set, already processed data will >>>>>>>>> be processed again. How does the aggregate functions works in this >>>>>>>>> case? >>>>>>>>> Actually, what is the plan on supporting aggregate functions? Let's >>>>>>>>> take >>>>>>>>> average function as an example. Are we going to persist sum and count >>>>>>>>> values per time windows and re-calculate whole average based on >>>>>>>>> values of >>>>>>>>> all time windows? Is so, I would guess we can update previously >>>>>>>>> processed >>>>>>>>> time windows sum and count values. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Mar 24, 2016 at 3:50 AM, Sachith Withana <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi Srinath, >>>>>>>>>> >>>>>>>>>> Please find the comments inline. >>>>>>>>>> >>>>>>>>>> On Thu, Mar 24, 2016 at 11:39 AM, Srinath Perera < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Sachith, Anjana, >>>>>>>>>>> >>>>>>>>>>> +1 for the backend model. >>>>>>>>>>> >>>>>>>>>>> Are we handling the case when last run was done, 25 minutes of >>>>>>>>>>> data is processed. Basically, next run has to re-run last hour and >>>>>>>>>>> update >>>>>>>>>>> the value. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes. It will recalculate for that hour and will update the value. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> When does one hour counting starts? is it from the moment server >>>>>>>>>>> starts? That will be probabilistic when you restart. I think we >>>>>>>>>>> need to >>>>>>>>>>> either start with know place ( midnight) or let user configure it. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the first run all the data available are processed. >>>>>>>>>> After that it calculates the floor of last processed events' >>>>>>>>>> timestamp and gets the floor value (timestamp - timestamp%3600), >>>>>>>>>> that would >>>>>>>>>> be used as the start of the time windows. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am bit concern about the syntax though. This only works for >>>>>>>>>>> very specific type of queries ( that includes aggregate and a group >>>>>>>>>>> by). >>>>>>>>>>> What happen if user do this with a different query? Can we give >>>>>>>>>>> clear error >>>>>>>>>>> message? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Currently the error messages are very generic. We will have to >>>>>>>>>> work on it to improve those messages. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sachith >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --Srinath >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 21, 2016 at 5:15 PM, Sachith Withana < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> We are adding incremental processing capability to Spark in DAS. >>>>>>>>>>>> As the first stage, we added time slicing to Spark execution. >>>>>>>>>>>> >>>>>>>>>>>> Here's a quick introduction into that. >>>>>>>>>>>> >>>>>>>>>>>> *Execution*: >>>>>>>>>>>> >>>>>>>>>>>> In the first run of the script, it will process all the data in >>>>>>>>>>>> the given table and store the last processed event timestamp. >>>>>>>>>>>> Then from the next run onwards it would start processing >>>>>>>>>>>> starting from that stored timestamp. >>>>>>>>>>>> >>>>>>>>>>>> Until the query that contains the data processing part, >>>>>>>>>>>> completes, last processed event timestamp would not be overridden >>>>>>>>>>>> with the >>>>>>>>>>>> new value. >>>>>>>>>>>> This is to ensure that the data processing for the query >>>>>>>>>>>> wouldn't have to be done again, if the whole query fails. >>>>>>>>>>>> This is ensured by adding a commit query after the main query. >>>>>>>>>>>> Refer to the Syntax section for the example. >>>>>>>>>>>> >>>>>>>>>>>> *Syntax*: >>>>>>>>>>>> >>>>>>>>>>>> In the spark script, incremental processing support has to be >>>>>>>>>>>> specified per table, this would happen in the create temporary >>>>>>>>>>>> table line. >>>>>>>>>>>> >>>>>>>>>>>> ex: CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options >>>>>>>>>>>> (tableName "test", >>>>>>>>>>>> *incrementalProcessing "T1,3600");* >>>>>>>>>>>> >>>>>>>>>>>> INSERT INTO T2 SELECT username, age GROUP BY age FROM T1; >>>>>>>>>>>> >>>>>>>>>>>> INC_TABLE_COMMIT T1; >>>>>>>>>>>> >>>>>>>>>>>> The last line is where it ensures the processing took place >>>>>>>>>>>> successfully and replaces the lastProcessed timestamp with the new >>>>>>>>>>>> one. >>>>>>>>>>>> >>>>>>>>>>>> *TimeWindow*: >>>>>>>>>>>> >>>>>>>>>>>> To do the incremental processing, the user has to provide the >>>>>>>>>>>> time window per which the data would be processed. >>>>>>>>>>>> In the above example. the data would be summarized in *1 hour *time >>>>>>>>>>>> windows. >>>>>>>>>>>> >>>>>>>>>>>> *WindowOffset*: >>>>>>>>>>>> >>>>>>>>>>>> Events might arrive late that would belong to a previous >>>>>>>>>>>> processed time window. To account to that, we have added an >>>>>>>>>>>> optional >>>>>>>>>>>> parameter that would allow to >>>>>>>>>>>> process immediately previous time windows as well ( acts like a >>>>>>>>>>>> buffer). >>>>>>>>>>>> ex: If this is set to 1, apart from the to-be-processed data, >>>>>>>>>>>> data related to the previously processed time window will also be >>>>>>>>>>>> taken for >>>>>>>>>>>> processing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Limitations*: >>>>>>>>>>>> >>>>>>>>>>>> Currently, multiple time windows cannot be specified per >>>>>>>>>>>> temporary table in the same script. >>>>>>>>>>>> It would have to be done using different temporary tables. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Future Improvements:* >>>>>>>>>>>> - Add aggregation function support for incremental processing >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sachith >>>>>>>>>>>> -- >>>>>>>>>>>> Sachith Withana >>>>>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>>>>>>> E-mail: sachith AT wso2.com >>>>>>>>>>>> M: +94715518127 >>>>>>>>>>>> Linked-In: <http://goog_416592669> >>>>>>>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ============================ >>>>>>>>>>> Srinath Perera, Ph.D. >>>>>>>>>>> http://people.apache.org/~hemapani/ >>>>>>>>>>> http://srinathsview.blogspot.com/ >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Sachith Withana >>>>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>>>>> E-mail: sachith AT wso2.com >>>>>>>>>> M: +94715518127 >>>>>>>>>> Linked-In: <http://goog_416592669> >>>>>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Architecture mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks & Regards, >>>>>>>>> >>>>>>>>> Inosh Goonewardena >>>>>>>>> Associate Technical Lead- WSO2 Inc. >>>>>>>>> Mobile: +94779966317 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Sachith Withana >>>>>>>> Software Engineer; WSO2 Inc.; http://wso2.com >>>>>>>> E-mail: sachith AT wso2.com >>>>>>>> M: +94715518127 >>>>>>>> Linked-In: <http://goog_416592669> >>>>>>>> https://lk.linkedin.com/in/sachithwithana >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks & Regards, >>>>>>> >>>>>>> Inosh Goonewardena >>>>>>> Associate Technical Lead- WSO2 Inc. >>>>>>> Mobile: +94779966317 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> [email protected] >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> W.G. Gihan Anuruddha >>>>>> Senior Software Engineer | WSO2, Inc. >>>>>> M: +94772272595 >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> [email protected] >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> >>>>> Inosh Goonewardena >>>>> Associate Technical Lead- WSO2 Inc. >>>>> Mobile: +94779966317 >>>>> >>>> >>>> >>>> >>>> -- >>>> *V. Mohanadarshan* >>>> *Associate Tech Lead,* >>>> *Data Technologies Team,* >>>> *WSO2, Inc. http://wso2.com <http://wso2.com> * >>>> *lean.enterprise.middleware.* >>>> >>>> email: [email protected] >>>> phone:(+94) 771117673 >>>> >>> >>> >>> >>> -- >>> Thanks & Regards, >>> >>> Inosh Goonewardena >>> Associate Technical Lead- WSO2 Inc. >>> Mobile: +94779966317 >>> >> >> >> >> -- >> Thanks & Regards, >> >> Inosh Goonewardena >> Associate Technical Lead- WSO2 Inc. >> Mobile: +94779966317 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://home.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Thanks & Regards, Inosh Goonewardena Associate Technical Lead- WSO2 Inc. Mobile: +94779966317
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
