Hi Seshika. Noted with thanks. In that case what name would you suggest for these new extensions (i.e. new regression, outlier and forecast functions)
Regards, Charini On Tue, Jun 7, 2016 at 7:13 PM, Seshika Fernando <[email protected]> wrote: > No, the existing regression function should remain, as its a different > usecase. There are many instances where we need to perform regression on a > set of events that are not limited by a time duration. In that case, the > existing regression implementation will be used. However, the duration > parameter should be available for outlier and forecast extensions as well. > So a user should be able to use the outlier/forecast regression functions > with/without duration parameter. So your new version should be applicable > to all 3 regression extensions (linear regression, outlier, forecast). > > seshi > > On Tue, Jun 7, 2016 at 6:32 PM, Charini Nanayakkara <[email protected]> > wrote: > >> Hi Seshika, Suho, >> >> Is the existent regression function to be entirely replaced by the new >> one? If so it's necessary to change implementation of outlier and forecast >> extensions as well, since those are based on the regression implementation. >> Furthermore, there's the concern of existent applications being rendered >> useless if the old version is entirely removed. If it's preferred to keep >> both, a new name is required for this extension. Since the regression >> function supports both time and length, a name such as regressTimeLength >> would be appropriate IMO. Please give your suggestions. >> >> Regards, >> Charini >> >> On Sun, Jun 5, 2016 at 11:28 AM, Seshika Fernando <[email protected]> >> wrote: >> >>> Hi, >>> The length ceiling is necessary along with the duration parameter. The >>> reason the batch size was originally implemented was to optimize >>> performance when large datasets are considered for regression. We need to >>> be able to give an upper bound. So for example in this case, if user uses a >>> large duration (24 hours)and there are millions of events, then if we put a >>> batch size of 1 million it will consider the last 1 million events in the >>> last 24 hours. Which is a valid use case. >>> >>> For this reason, the ability to specify both duration and batch size is >>> important. >>> >>> Seshi >>> On 2 Jun 2016 14:45, "Charini Nanayakkara" <[email protected]> wrote: >>> >>>> Noted with thanks. Will proceed with the implementation likewise. >>>> >>>> Charini >>>> >>>> On Thu, Jun 2, 2016 at 2:28 PM, Sriskandarajah Suhothayan < >>>> [email protected]> wrote: >>>> >>>>> I think having batchSize & duration will be good as this will limit >>>>> the number of events considered, this can help to improve performance as >>>>> well. >>>>> >>>>> Suho >>>>> >>>>> On Thu, Jun 2, 2016 at 1:59 PM, Charini Nanayakkara <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi Tishan, >>>>>> >>>>>> For my requirement, having time window alone is adequate. So your >>>>>> point might be valid. However I'm concerned of the re-usability of the >>>>>> extension. >>>>>> >>>>>> @Srinath, WDYT? Which would be the better option? Having a single >>>>>> implementation or two different ones? >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Thu, Jun 2, 2016 at 1:48 PM, Tishan Dahanayakage <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Charini, >>>>>>> >>>>>>> My knowledge on the on this domain is sparse. Hence I do not know >>>>>>> whether a scenario where time AND length is a valid business case. If >>>>>>> it is >>>>>>> a valid business case +1 for the design including both parameters in >>>>>>> same >>>>>>> implementation. >>>>>>> >>>>>>> Thanks >>>>>>> /Tishan >>>>>>> >>>>>>> On Thu, Jun 2, 2016 at 12:54 PM, Charini Nanayakkara < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Tishan, >>>>>>>> >>>>>>>> Yes. Assuming batch size is 5 and time window is 20 mins, only 5 >>>>>>>> out of 10 events which arrive within last 5 mins would be processed >>>>>>>> due to >>>>>>>> batch size constraint (even though all events must be processed if time >>>>>>>> alone was considered). Having separate implementations would work on >>>>>>>> the >>>>>>>> majority of the scenarios, since only time OR length is usually >>>>>>>> applicable >>>>>>>> but not both. However, having two implementations would cause trouble >>>>>>>> in >>>>>>>> the situations where both the time factor and length are important >>>>>>>> (equivalent to AND operation on the two constraints). If our >>>>>>>> requirement is >>>>>>>> to have only one of the two constraints, we can use a very large value >>>>>>>> for >>>>>>>> the other parameter (i.e. if we only need to limit number of events >>>>>>>> based >>>>>>>> on time = 1 sec constraint, we can specify 1,000,000 for batch size >>>>>>>> assuming we have prior knowledge that 1,000,000 events would never >>>>>>>> arrive >>>>>>>> within 1 sec). IMHO neither of the two options (separate or single >>>>>>>> implementation) are perfect for every scenario. However having a single >>>>>>>> implementation would help address more cases as I understand. What's >>>>>>>> your >>>>>>>> opinion on this? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I have planned to extend the existent Regression Function by >>>>>>>>> adding time parameter. Regression is a functionality available for the >>>>>>>>> Siddhi stream processor extension known as timeseries. In the current >>>>>>>>> implementation, the regression function consumes two or more >>>>>>>>> parameters and >>>>>>>>> performs regression as follows. >>>>>>>>> >>>>>>>>> The mandatory parameters to be given are the dependent attribute Y >>>>>>>>> and the independent attribute(s) X1, X2,....Xn. For performing simple >>>>>>>>> linear regression, merely one independent attribute would be given. >>>>>>>>> Two or >>>>>>>>> more independent attributes are consumed for executing multiple linear >>>>>>>>> regression. >>>>>>>>> >>>>>>>>> timeseries:regress(Y, X1, X2......,Xn) >>>>>>>>> >>>>>>>>> The other three optional parameters to be specified are >>>>>>>>> calculation interval, batch size and confidence interval (ci). In the >>>>>>>>> case >>>>>>>>> where those are not specified, the default values would be assumed. >>>>>>>>> >>>>>>>>> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn) >>>>>>>>> >>>>>>>>> Batch size works as a length window in this implementation, which >>>>>>>>> allows one to restrict the number of events considered when executing >>>>>>>>> regression in real time. For example, if length is 5, only the latest >>>>>>>>> 5 >>>>>>>>> events (current event and the 4 events prior to it) would be used for >>>>>>>>> performing regression. >>>>>>>>> >>>>>>>>> *This suggested extension would allow the user to restrict the >>>>>>>>> number of events based on a time window as well, apart from >>>>>>>>> constraining >>>>>>>>> based on length only. Therefore regression function would consume >>>>>>>>> duration >>>>>>>>> as an additional parameter, subsequent to the completion of my task. * >>>>>>>>> >>>>>>>>> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1, >>>>>>>>> X2......,Xn).* >>>>>>>>> >>>>>>>>> Here the parameter 'duration' would comprise of two parts, where >>>>>>>>> the first part specifies the number and the second part specifies the >>>>>>>>> unit >>>>>>>>> (e.g. 2 sec, 5 mins, 7 days). On arrival of each event, the past >>>>>>>>> events to >>>>>>>>> be considered for performing regression would be based on this >>>>>>>>> 'duration' >>>>>>>>> (i.e. If a new event arrives at 10.00 a.m and the duration is 5 >>>>>>>>> mins, only >>>>>>>>> the events which arrived within the time period of 9.55 a.m to 10.00 >>>>>>>>> a.m >>>>>>>>> are considered for regression). >>>>>>>>> >>>>>>>>> Suggestions and comments are most welcome. >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Charini Vimansha Nanayakkara >>>>>>>>> Software Engineer at WSO2 >>>>>>>>> Mobile: 0714126293 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Charini Vimansha Nanayakkara >>>>>>>> Software Engineer at WSO2 >>>>>>>> Mobile: 0714126293 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Tishan Dahanayakage >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> Mobile:+94 716481328 >>>>>>> >>>>>>> Disclaimer: This communication may contain privileged or other >>>>>>> confidential information and is intended exclusively for the >>>>>>> addressee/s. >>>>>>> If you are not the intended recipient/s, or believe that you may have >>>>>>> received this communication in error, please reply to the sender >>>>>>> indicating >>>>>>> that fact and delete the copy you received and in addition, you should >>>>>>> not >>>>>>> print, copy, re-transmit, disseminate, or otherwise use the information >>>>>>> contained in this communication. Internet communications cannot be >>>>>>> guaranteed to be timely, secure, error or virus-free. The sender does >>>>>>> not >>>>>>> accept liability for any errors or omissions. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Charini Vimansha Nanayakkara >>>>>> Software Engineer at WSO2 >>>>>> Mobile: 0714126293 >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> *S. Suhothayan* >>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>> *WSO2 Inc. *http://wso2.com >>>>> * <http://wso2.com/>* >>>>> lean . enterprise . middleware >>>>> >>>>> >>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: >>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>>>> http://lk.linkedin.com/in/suhothayan >>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>> >>>> >>>> >>>> >>>> -- >>>> Charini Vimansha Nanayakkara >>>> Software Engineer at WSO2 >>>> Mobile: 0714126293 >>>> >>>> >> >> >> -- >> Charini Vimansha Nanayakkara >> Software Engineer at WSO2 >> Mobile: 0714126293 >> >> > -- Charini Vimansha Nanayakkara Software Engineer at WSO2 Mobile: 0714126293
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
