Hi Seshika, Suho,

Is the existent regression function to be entirely replaced by the new one?
If so it's necessary to change implementation of outlier and forecast
extensions as well, since those are based on the regression implementation.
Furthermore, there's the concern of existent applications being rendered
useless if the old version is entirely removed. If it's preferred to keep
both, a new name is required for this extension.  Since the regression
function supports both time and length, a name such as regressTimeLength
would be appropriate IMO. Please give your suggestions.

Regards,
Charini

On Sun, Jun 5, 2016 at 11:28 AM, Seshika Fernando <[email protected]> wrote:

> Hi,
> The length ceiling is necessary along with the duration parameter. The
> reason the batch size was originally implemented was to optimize
> performance when large datasets are considered for regression. We need to
> be able to give an upper bound. So for example in this case, if user uses a
> large duration (24 hours)and there are millions of events, then if we put a
> batch size of 1 million it will consider the last 1 million events in the
> last 24 hours. Which is a valid use case.
>
> For this reason, the ability to specify both duration and batch size is
> important.
>
> Seshi
> On 2 Jun 2016 14:45, "Charini Nanayakkara" <[email protected]> wrote:
>
>> Noted with thanks. Will proceed with the implementation likewise.
>>
>> Charini
>>
>> On Thu, Jun 2, 2016 at 2:28 PM, Sriskandarajah Suhothayan <[email protected]>
>> wrote:
>>
>>> I think having batchSize & duration will be good as this will limit the
>>> number of events considered, this can help to improve performance as well.
>>>
>>> Suho
>>>
>>> On Thu, Jun 2, 2016 at 1:59 PM, Charini Nanayakkara <[email protected]>
>>> wrote:
>>>
>>>> Hi Tishan,
>>>>
>>>> For my requirement, having time window alone is adequate. So your point
>>>> might be valid. However I'm concerned of the re-usability of the extension.
>>>>
>>>> @Srinath, WDYT? Which would be the better option? Having a single
>>>> implementation or two different ones?
>>>>
>>>> Thanks
>>>>
>>>> On Thu, Jun 2, 2016 at 1:48 PM, Tishan Dahanayakage <[email protected]>
>>>> wrote:
>>>>
>>>>> Charini,
>>>>>
>>>>> My knowledge on the on this domain is sparse. Hence I do not know
>>>>> whether a scenario where time AND length is a valid business case. If it 
>>>>> is
>>>>> a valid business case +1 for the design including both parameters in same
>>>>> implementation.
>>>>>
>>>>> Thanks
>>>>> /Tishan
>>>>>
>>>>> On Thu, Jun 2, 2016 at 12:54 PM, Charini Nanayakkara <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Tishan,
>>>>>>
>>>>>> Yes. Assuming batch size is 5 and time window is 20 mins, only 5 out
>>>>>> of 10 events which arrive within last 5 mins would be processed due to
>>>>>> batch size constraint (even though all events must be processed if time
>>>>>> alone was considered). Having separate implementations would work on the
>>>>>> majority of the scenarios, since only time OR length is usually 
>>>>>> applicable
>>>>>> but not both. However, having two implementations would cause trouble in
>>>>>> the situations where both the time factor and length are important
>>>>>> (equivalent to AND operation on the two constraints). If our requirement 
>>>>>> is
>>>>>> to have only one of the two constraints, we can use a very large value 
>>>>>> for
>>>>>> the other parameter (i.e. if we only need to limit number of events based
>>>>>> on time = 1 sec constraint, we can specify 1,000,000 for batch size
>>>>>> assuming we have prior knowledge that 1,000,000 events would never arrive
>>>>>> within 1 sec). IMHO neither of the two options (separate or single
>>>>>> implementation) are perfect for every scenario. However having a single
>>>>>> implementation would help address more cases as I understand. What's your
>>>>>> opinion on this?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have planned to extend the existent Regression Function by adding
>>>>>>> time parameter. Regression is a functionality available for the Siddhi
>>>>>>> stream processor extension known as timeseries. In the current
>>>>>>> implementation, the regression function consumes two or more parameters 
>>>>>>> and
>>>>>>> performs regression as follows.
>>>>>>>
>>>>>>> The mandatory parameters to be given are the dependent attribute Y
>>>>>>> and the independent attribute(s) X1, X2,....Xn. For performing simple
>>>>>>> linear regression, merely one independent attribute would be given. Two 
>>>>>>> or
>>>>>>> more independent attributes are consumed for executing multiple linear
>>>>>>> regression.
>>>>>>>
>>>>>>> timeseries:regress(Y, X1, X2......,Xn)
>>>>>>>
>>>>>>> The other three optional parameters to be specified are calculation
>>>>>>> interval, batch size and confidence interval (ci). In the case where 
>>>>>>> those
>>>>>>> are not specified, the default values would be assumed.
>>>>>>>
>>>>>>> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn)
>>>>>>>
>>>>>>> Batch size works as a length window in this implementation, which
>>>>>>> allows one to restrict the number of events considered when executing
>>>>>>> regression in real time. For example, if length is 5, only the latest 5
>>>>>>> events (current event and the 4 events prior to it) would be used for
>>>>>>> performing regression.
>>>>>>>
>>>>>>> *This suggested extension would allow the user to restrict the
>>>>>>> number of events based on a time window as well, apart from constraining
>>>>>>> based on length only. Therefore regression function would consume 
>>>>>>> duration
>>>>>>> as an additional parameter, subsequent to the completion of my task. *
>>>>>>>
>>>>>>> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1,
>>>>>>> X2......,Xn).*
>>>>>>>
>>>>>>> Here the parameter 'duration' would comprise of two parts, where the
>>>>>>> first part specifies the number and the second part specifies the unit
>>>>>>> (e.g. 2 sec, 5 mins, 7 days). On arrival of each event, the past events 
>>>>>>> to
>>>>>>> be considered for performing regression would be based on this 
>>>>>>> 'duration'
>>>>>>> (i.e. If a new event arrives at 10.00 a.m and the duration is 5  mins, 
>>>>>>> only
>>>>>>> the events which arrived within the time period of 9.55 a.m to 10.00 a.m
>>>>>>> are considered for regression).
>>>>>>>
>>>>>>> Suggestions and comments are most welcome.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> --
>>>>>>> Charini Vimansha Nanayakkara
>>>>>>> Software Engineer at WSO2
>>>>>>> Mobile: 0714126293
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Charini Vimansha Nanayakkara
>>>>>> Software Engineer at WSO2
>>>>>> Mobile: 0714126293
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Tishan Dahanayakage
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> Mobile:+94 716481328
>>>>>
>>>>> Disclaimer: This communication may contain privileged or other
>>>>> confidential information and is intended exclusively for the addressee/s.
>>>>> If you are not the intended recipient/s, or believe that you may have
>>>>> received this communication in error, please reply to the sender 
>>>>> indicating
>>>>> that fact and delete the copy you received and in addition, you should not
>>>>> print, copy, re-transmit, disseminate, or otherwise use the information
>>>>> contained in this communication. Internet communications cannot be
>>>>> guaranteed to be timely, secure, error or virus-free. The sender does not
>>>>> accept liability for any errors or omissions.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Charini Vimansha Nanayakkara
>>>> Software Engineer at WSO2
>>>> Mobile: 0714126293
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *S. Suhothayan*
>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>> *WSO2 Inc. *http://wso2.com
>>> * <http://wso2.com/>*
>>> lean . enterprise . middleware
>>>
>>>
>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>>>
>>
>>
>>
>> --
>> Charini Vimansha Nanayakkara
>> Software Engineer at WSO2
>> Mobile: 0714126293
>>
>>


-- 
Charini Vimansha Nanayakkara
Software Engineer at WSO2
Mobile: 0714126293
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to