Hi Tishan, Yes. Assuming batch size is 5 and time window is 20 mins, only 5 out of 10 events which arrive within last 5 mins would be processed due to batch size constraint (even though all events must be processed if time alone was considered). Having separate implementations would work on the majority of the scenarios, since only time OR length is usually applicable but not both. However, having two implementations would cause trouble in the situations where both the time factor and length are important (equivalent to AND operation on the two constraints). If our requirement is to have only one of the two constraints, we can use a very large value for the other parameter (i.e. if we only need to limit number of events based on time = 1 sec constraint, we can specify 1,000,000 for batch size assuming we have prior knowledge that 1,000,000 events would never arrive within 1 sec). IMHO neither of the two options (separate or single implementation) are perfect for every scenario. However having a single implementation would help address more cases as I understand. What's your opinion on this?
Thanks On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara <[email protected]> wrote: > Hi All, > > I have planned to extend the existent Regression Function by adding time > parameter. Regression is a functionality available for the Siddhi stream > processor extension known as timeseries. In the current implementation, the > regression function consumes two or more parameters and performs regression > as follows. > > The mandatory parameters to be given are the dependent attribute Y and the > independent attribute(s) X1, X2,....Xn. For performing simple linear > regression, merely one independent attribute would be given. Two or more > independent attributes are consumed for executing multiple linear > regression. > > timeseries:regress(Y, X1, X2......,Xn) > > The other three optional parameters to be specified are calculation > interval, batch size and confidence interval (ci). In the case where those > are not specified, the default values would be assumed. > > timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn) > > Batch size works as a length window in this implementation, which allows > one to restrict the number of events considered when executing regression > in real time. For example, if length is 5, only the latest 5 events > (current event and the 4 events prior to it) would be used for performing > regression. > > *This suggested extension would allow the user to restrict the number of > events based on a time window as well, apart from constraining based on > length only. Therefore regression function would consume duration as an > additional parameter, subsequent to the completion of my task. * > > *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1, > X2......,Xn).* > > Here the parameter 'duration' would comprise of two parts, where the first > part specifies the number and the second part specifies the unit (e.g. 2 > sec, 5 mins, 7 days). On arrival of each event, the past events to be > considered for performing regression would be based on this 'duration' > (i.e. If a new event arrives at 10.00 a.m and the duration is 5 mins, only > the events which arrived within the time period of 9.55 a.m to 10.00 a.m > are considered for regression). > > Suggestions and comments are most welcome. > > Thank you. > > -- > Charini Vimansha Nanayakkara > Software Engineer at WSO2 > Mobile: 0714126293 > > -- Charini Vimansha Nanayakkara Software Engineer at WSO2 Mobile: 0714126293
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
