Hi Seshika.

Noted with thanks. In that case what name would you suggest for these new
extensions (i.e. new regression, outlier and forecast functions)

Regards,
Charini

On Tue, Jun 7, 2016 at 7:13 PM, Seshika Fernando <[email protected]> wrote:

> No, the existing regression function should remain, as its a different
> usecase. There are many instances where we need to perform regression on a
> set of events that are not limited by a time duration. In that case, the
> existing regression implementation will be used. However, the duration
> parameter should be available for outlier and forecast extensions as well.
> So a user should be able to use the outlier/forecast regression functions
> with/without duration parameter. So your new version should be applicable
> to all 3 regression extensions (linear regression, outlier, forecast).
>
> seshi
>
> On Tue, Jun 7, 2016 at 6:32 PM, Charini Nanayakkara <[email protected]>
> wrote:
>
>> Hi Seshika, Suho,
>>
>> Is the existent regression function to be entirely replaced by the new
>> one? If so it's necessary to change implementation of outlier and forecast
>> extensions as well, since those are based on the regression implementation.
>> Furthermore, there's the concern of existent applications being rendered
>> useless if the old version is entirely removed. If it's preferred to keep
>> both, a new name is required for this extension.  Since the regression
>> function supports both time and length, a name such as regressTimeLength
>> would be appropriate IMO. Please give your suggestions.
>>
>> Regards,
>> Charini
>>
>> On Sun, Jun 5, 2016 at 11:28 AM, Seshika Fernando <[email protected]>
>> wrote:
>>
>>> Hi,
>>> The length ceiling is necessary along with the duration parameter. The
>>> reason the batch size was originally implemented was to optimize
>>> performance when large datasets are considered for regression. We need to
>>> be able to give an upper bound. So for example in this case, if user uses a
>>> large duration (24 hours)and there are millions of events, then if we put a
>>> batch size of 1 million it will consider the last 1 million events in the
>>> last 24 hours. Which is a valid use case.
>>>
>>> For this reason, the ability to specify both duration and batch size is
>>> important.
>>>
>>> Seshi
>>> On 2 Jun 2016 14:45, "Charini Nanayakkara" <[email protected]> wrote:
>>>
>>>> Noted with thanks. Will proceed with the implementation likewise.
>>>>
>>>> Charini
>>>>
>>>> On Thu, Jun 2, 2016 at 2:28 PM, Sriskandarajah Suhothayan <
>>>> [email protected]> wrote:
>>>>
>>>>> I think having batchSize & duration will be good as this will limit
>>>>> the number of events considered, this can help to improve performance as
>>>>> well.
>>>>>
>>>>> Suho
>>>>>
>>>>> On Thu, Jun 2, 2016 at 1:59 PM, Charini Nanayakkara <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Hi Tishan,
>>>>>>
>>>>>> For my requirement, having time window alone is adequate. So your
>>>>>> point might be valid. However I'm concerned of the re-usability of the
>>>>>> extension.
>>>>>>
>>>>>> @Srinath, WDYT? Which would be the better option? Having a single
>>>>>> implementation or two different ones?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Thu, Jun 2, 2016 at 1:48 PM, Tishan Dahanayakage <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Charini,
>>>>>>>
>>>>>>> My knowledge on the on this domain is sparse. Hence I do not know
>>>>>>> whether a scenario where time AND length is a valid business case. If 
>>>>>>> it is
>>>>>>> a valid business case +1 for the design including both parameters in 
>>>>>>> same
>>>>>>> implementation.
>>>>>>>
>>>>>>> Thanks
>>>>>>> /Tishan
>>>>>>>
>>>>>>> On Thu, Jun 2, 2016 at 12:54 PM, Charini Nanayakkara <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Tishan,
>>>>>>>>
>>>>>>>> Yes. Assuming batch size is 5 and time window is 20 mins, only 5
>>>>>>>> out of 10 events which arrive within last 5 mins would be processed 
>>>>>>>> due to
>>>>>>>> batch size constraint (even though all events must be processed if time
>>>>>>>> alone was considered). Having separate implementations would work on 
>>>>>>>> the
>>>>>>>> majority of the scenarios, since only time OR length is usually 
>>>>>>>> applicable
>>>>>>>> but not both. However, having two implementations would cause trouble 
>>>>>>>> in
>>>>>>>> the situations where both the time factor and length are important
>>>>>>>> (equivalent to AND operation on the two constraints). If our 
>>>>>>>> requirement is
>>>>>>>> to have only one of the two constraints, we can use a very large value 
>>>>>>>> for
>>>>>>>> the other parameter (i.e. if we only need to limit number of events 
>>>>>>>> based
>>>>>>>> on time = 1 sec constraint, we can specify 1,000,000 for batch size
>>>>>>>> assuming we have prior knowledge that 1,000,000 events would never 
>>>>>>>> arrive
>>>>>>>> within 1 sec). IMHO neither of the two options (separate or single
>>>>>>>> implementation) are perfect for every scenario. However having a single
>>>>>>>> implementation would help address more cases as I understand. What's 
>>>>>>>> your
>>>>>>>> opinion on this?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I have planned to extend the existent Regression Function by
>>>>>>>>> adding time parameter. Regression is a functionality available for the
>>>>>>>>> Siddhi stream processor extension known as timeseries. In the current
>>>>>>>>> implementation, the regression function consumes two or more 
>>>>>>>>> parameters and
>>>>>>>>> performs regression as follows.
>>>>>>>>>
>>>>>>>>> The mandatory parameters to be given are the dependent attribute Y
>>>>>>>>> and the independent attribute(s) X1, X2,....Xn. For performing simple
>>>>>>>>> linear regression, merely one independent attribute would be given. 
>>>>>>>>> Two or
>>>>>>>>> more independent attributes are consumed for executing multiple linear
>>>>>>>>> regression.
>>>>>>>>>
>>>>>>>>> timeseries:regress(Y, X1, X2......,Xn)
>>>>>>>>>
>>>>>>>>> The other three optional parameters to be specified are
>>>>>>>>> calculation interval, batch size and confidence interval (ci). In the 
>>>>>>>>> case
>>>>>>>>> where those are not specified, the default values would be assumed.
>>>>>>>>>
>>>>>>>>> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn)
>>>>>>>>>
>>>>>>>>> Batch size works as a length window in this implementation, which
>>>>>>>>> allows one to restrict the number of events considered when executing
>>>>>>>>> regression in real time. For example, if length is 5, only the latest 
>>>>>>>>> 5
>>>>>>>>> events (current event and the 4 events prior to it) would be used for
>>>>>>>>> performing regression.
>>>>>>>>>
>>>>>>>>> *This suggested extension would allow the user to restrict the
>>>>>>>>> number of events based on a time window as well, apart from 
>>>>>>>>> constraining
>>>>>>>>> based on length only. Therefore regression function would consume 
>>>>>>>>> duration
>>>>>>>>> as an additional parameter, subsequent to the completion of my task. *
>>>>>>>>>
>>>>>>>>> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1,
>>>>>>>>> X2......,Xn).*
>>>>>>>>>
>>>>>>>>> Here the parameter 'duration' would comprise of two parts, where
>>>>>>>>> the first part specifies the number and the second part specifies the 
>>>>>>>>> unit
>>>>>>>>> (e.g. 2 sec, 5 mins, 7 days). On arrival of each event, the past 
>>>>>>>>> events to
>>>>>>>>> be considered for performing regression would be based on this 
>>>>>>>>> 'duration'
>>>>>>>>> (i.e. If a new event arrives at 10.00 a.m and the duration is 5  
>>>>>>>>> mins, only
>>>>>>>>> the events which arrived within the time period of 9.55 a.m to 10.00 
>>>>>>>>> a.m
>>>>>>>>> are considered for regression).
>>>>>>>>>
>>>>>>>>> Suggestions and comments are most welcome.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Charini Vimansha Nanayakkara
>>>>>>>>> Software Engineer at WSO2
>>>>>>>>> Mobile: 0714126293
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Charini Vimansha Nanayakkara
>>>>>>>> Software Engineer at WSO2
>>>>>>>> Mobile: 0714126293
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tishan Dahanayakage
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> Mobile:+94 716481328
>>>>>>>
>>>>>>> Disclaimer: This communication may contain privileged or other
>>>>>>> confidential information and is intended exclusively for the 
>>>>>>> addressee/s.
>>>>>>> If you are not the intended recipient/s, or believe that you may have
>>>>>>> received this communication in error, please reply to the sender 
>>>>>>> indicating
>>>>>>> that fact and delete the copy you received and in addition, you should 
>>>>>>> not
>>>>>>> print, copy, re-transmit, disseminate, or otherwise use the information
>>>>>>> contained in this communication. Internet communications cannot be
>>>>>>> guaranteed to be timely, secure, error or virus-free. The sender does 
>>>>>>> not
>>>>>>> accept liability for any errors or omissions.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Charini Vimansha Nanayakkara
>>>>>> Software Engineer at WSO2
>>>>>> Mobile: 0714126293
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *S. Suhothayan*
>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>> *WSO2 Inc. *http://wso2.com
>>>>> * <http://wso2.com/>*
>>>>> lean . enterprise . middleware
>>>>>
>>>>>
>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>>>>> http://lk.linkedin.com/in/suhothayan 
>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Charini Vimansha Nanayakkara
>>>> Software Engineer at WSO2
>>>> Mobile: 0714126293
>>>>
>>>>
>>
>>
>> --
>> Charini Vimansha Nanayakkara
>> Software Engineer at WSO2
>> Mobile: 0714126293
>>
>>
>


-- 
Charini Vimansha Nanayakkara
Software Engineer at WSO2
Mobile: 0714126293
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to