Hi Tishan,

Yes. Assuming batch size is 5 and time window is 20 mins, only 5 out of 10
events which arrive within last 5 mins would be processed due to batch size
constraint (even though all events must be processed if time alone was
considered). Having separate implementations would work on the majority of
the scenarios, since only time OR length is usually applicable but not
both. However, having two implementations would cause trouble in the
situations where both the time factor and length are important (equivalent
to AND operation on the two constraints). If our requirement is to have
only one of the two constraints, we can use a very large value for the
other parameter (i.e. if we only need to limit number of events based on
time = 1 sec constraint, we can specify 1,000,000 for batch size assuming
we have prior knowledge that 1,000,000 events would never arrive within 1
sec). IMHO neither of the two options (separate or single implementation)
are perfect for every scenario. However having a single implementation
would help address more cases as I understand. What's your opinion on this?

Thanks

On Thu, Jun 2, 2016 at 10:14 AM, Charini Nanayakkara <[email protected]>
wrote:

> Hi All,
>
> I have planned to extend the existent Regression Function by adding time
> parameter. Regression is a functionality available for the Siddhi stream
> processor extension known as timeseries. In the current implementation, the
> regression function consumes two or more parameters and performs regression
> as follows.
>
> The mandatory parameters to be given are the dependent attribute Y and the
> independent attribute(s) X1, X2,....Xn. For performing simple linear
> regression, merely one independent attribute would be given. Two or more
> independent attributes are consumed for executing multiple linear
> regression.
>
> timeseries:regress(Y, X1, X2......,Xn)
>
> The other three optional parameters to be specified are calculation
> interval, batch size and confidence interval (ci). In the case where those
> are not specified, the default values would be assumed.
>
> timeseries:regress(calcInterval, batchSize, ci, Y, X1, X2......,Xn)
>
> Batch size works as a length window in this implementation, which allows
> one to restrict the number of events considered when executing regression
> in real time. For example, if length is 5, only the latest 5 events
> (current event and the 4 events prior to it) would be used for performing
> regression.
>
> *This suggested extension would allow the user to restrict the number of
> events based on a time window as well, apart from constraining based on
> length only. Therefore regression function would consume duration as an
> additional parameter, subsequent to the completion of my task. *
>
> *timeseries:regress(calcInterval, duration, batchSize, ci, Y, X1,
> X2......,Xn).*
>
> Here the parameter 'duration' would comprise of two parts, where the first
> part specifies the number and the second part specifies the unit (e.g. 2
> sec, 5 mins, 7 days). On arrival of each event, the past events to be
> considered for performing regression would be based on this 'duration'
> (i.e. If a new event arrives at 10.00 a.m and the duration is 5  mins, only
> the events which arrived within the time period of 9.55 a.m to 10.00 a.m
> are considered for regression).
>
> Suggestions and comments are most welcome.
>
> Thank you.
>
> --
> Charini Vimansha Nanayakkara
> Software Engineer at WSO2
> Mobile: 0714126293
>
>


-- 
Charini Vimansha Nanayakkara
Software Engineer at WSO2
Mobile: 0714126293
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to