On Sat, Nov 22, 2014 at 10:34 PM, Lasantha Fernando <[email protected]>
wrote:

> Hi all,
>
> +1 for the idea. IMO using the *from* clause on event tables to do batch
> processing makes sense and fits the current Siddhi model nicely.
>
> For supporting the *from* clause when the timestamp is not given, can we
> add a timestamp internally to the table if the event table definition does
> not contain a timestamp attribute?
>
> i.e.
>
> - If the event table definition contains a timestamp attribute, that will
> be used (If not, we let the user specify what should be picked up as the
> timestamp attribute)
>
Yes, all BAM data has timestamp. But user can override with an other column
if he likes.

> - If the timestamp is not available, and not specified either, we add a
> timestamp internally.
>
- At the time of inserting the event, we check whether the timestamp is
> there and if not add the timestamp at the time of inserting the event to
> the table.
>
> The timestamp attribute will have to be treated a bit specially.
>
One option is when there is no timestamp, all the data matches. (reverting
to what we have now in event tables). e.g. "from EventTable ... " is
allowed that matches all the data, but "from EventTable#window.batch(5m)"
will throw an error as there is no timestamp.

>
> I believe that this approach does have scenarios/use-cases where the
> behaviour would be not as expected. (e.g. if timestamp is null or corrupted
> for only a selected set of events; also setting the timestamp of the event
> to the time at which it was received by the server does not make sense at
> all times). Just throwing out the idea to see if it is feasible.. :-)
>
That users can handle by giving his own time stamp feild.
If timestamp is null or corrupted, we need to log an error IMO.

>
> Thanks,
> Lasantha
>
> On 22 November 2014 at 08:00, Srinath Perera <[email protected]> wrote:
>
>> IMO, use of window batch is consistant (My argument is, it is interpreted
>> as batch process when used with event tables). But lets think more on that.
>>
>> --Srinath
>>
>> On Sat, Nov 22, 2014 at 7:56 AM, Sriskandarajah Suhothayan <[email protected]
>> > wrote:
>>
>>> +1
>>> I like the idea, only worried about the use of *#window.batch()*
>>> How about using *WeatherStream#pull#window.time(24h) *instead?
>>>
>>> Suho
>>>
>>> On Fri, Nov 21, 2014 at 8:33 PM, Srinath Perera <[email protected]>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>> I think I have finally figured out a way do the Paul's idea (See the
>>>> thread "Unifying CEP and BAM Languages"
>>>> http://mail.wso2.org/mailarchive/architecture/2014-May/016366.html for
>>>> more details).
>>>>
>>>>
>>>> Currently event tables in Siddhi are backed by a disk and can be used
>>>> to joins
>>>>
>>>> However, you must use it with a stream and it is triggered from a
>>>> stream. For example currently *from WeatherStream insert into ..* is
>>>> not defined.  *Idea is to use that for batch processing!*
>>>>
>>>> Proposal is to extend event tables definition
>>>>
>>>> 1. Add an timestamp field  to event tables (Then BAM stream also become
>>>> a event table)
>>>>
>>>> 2. *Define “from” operation on event table to cover batch processing*.
>>>> For example
>>>>
>>>> from EvetTable#window.batch(2h) avg(pressure) as avgPressure will run
>>>> as batch job with 2h data using data in the event table. (We will execute
>>>> this in Spark using Siddhi engine). If #window.batch(2h) is omitted,
>>>> #window.batch(5m) is assumed.
>>>>
>>>> 3. You can define an event table on top of DB, disk, Cassandra, hdfs,
>>>> BAM stream stored etc ..
>>>>
>>>> Let me take an example
>>>>
>>>> Say you want to calculate avg and stddev of pressure once every 24h as
>>>> a batch job, and raise an alarm if current pressure is less than 3 stddev
>>>> from the mean calculated in batch process. (this is a extended Lambda
>>>> architecture scenario) Then you can do all this with Siddhi, and query will
>>>> look like following.
>>>>
>>>> *define eventTable WeatherTable using BAMStream … *
>>>>
>>>> *define eventTable  WeatherStatsTable using BAMStream … *
>>>>
>>>> *//batch job*
>>>>
>>>> *from **WeatherTable**#window.batch(24h) stddev(pressure) , avg
>>>> (pressure)*
>>>>
>>>> *     insert into **WeatherStatsTable**; *
>>>>
>>>> *//use results from batch job in realtime *
>>>>
>>>> *from WeatherStream as  p join **WeatherStatsTable**#window.last() as
>>>> s *
>>>>
>>>> *          on pressure < s.mean -2*s.stddev*
>>>>
>>>> *     insert into WeatherAlarmStream; *
>>>>
>>>> First query runs once every 24 hours and calculate mean and stddev and
>>>> write to disk. That value is joined against the live stream as they come in
>>>> via join.
>>>>
>>>> Few more rules (and there are more details that we need to figure out)
>>>>
>>>> 1. You read from event table, then it runs as batch processes
>>>>
>>>> 2. If you join with event table, it works as now. However, you can
>>>> define a window when joining on top of event tables as well. e.g.
>>>> WeatherStats#window.batch(5m) means takes events came in last 5 mins.
>>>>
>>>> 3. We need to define how it behaves when timestamp field is not
>>>> defined. Best is to only support *joins* and not support *from* in
>>>> that case.
>>>>
>>>> 4. When processing batches, it runs in parallel if partitions are
>>>> defined. For example, if you want to calculate mean in map reduce style, it
>>>> will look like following.
>>>>
>>>> *define partition StationParition WeatherStream.stationID; *
>>>>
>>>> *from WeatherStream#window.batch(24h) avg(pressure) *
>>>>
>>>> *     insert into WeatherMeanStream using StationParition;  *
>>>>
>>>> If no partitions, it will run sequentially (We can improve on this
>>>> later).
>>>>
>>>> For execution, we want to init siddhi within Spark, and run thing in
>>>> parallel using spark, but actual evaluation will done by Siddhi.
>>>>
>>>> 5. Users can define other windows like sliding windows with event
>>>> tables. However, Siddhi will read data from disk once every 5 minutes or
>>>> so. So results will be the same, however, it might come bit later than with
>>>> streams.
>>>>
>>>>
>>>> Please comment
>>>>
>>>> Thanks
>>>>
>>>> Srinath
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ============================
>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>> Site: http://people.apache.org/~hemapani/
>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>> Phone: 0772360902
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *S. Suhothayan*
>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>  *WSO2 Inc. *http://wso2.com
>>> * <http://wso2.com/>*
>>> lean . enterprise . middleware
>>>
>>>
>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>>>
>>
>>
>>
>> --
>> ============================
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://people.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Lasantha Fernando*
> Software Engineer - Data Technologies Team
> WSO2 Inc. http://wso2.com
>
> email: [email protected]
> mobile: (+94) 71 5247551
>



-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to