Sanjiva, Suho and myself had a chat We thought it is best not to use window.time .. but add something else as this is different. (hence "with" here).
With the query we want to tell, "from the Event table t, process last Kh of data every fh time period" Syntax we came up with is from TTable#with.time( ..) every 2h select ... Details - Instead of every 2h, users can also use cron expression with cron(...). - Instead of with.time .. users can use with.length( ..) with.sql( ..) etc Actual implementation would run in Spark. examples from TTable#with.time(24h) every 2h select sensor, avg(time) from TTable#with.time(24h)#myMLAlgo( ..) every 2h select .. On Mon, Nov 24, 2014 at 6:25 AM, Srinath Perera <[email protected]> wrote: > > > On Sat, Nov 22, 2014 at 10:34 PM, Lasantha Fernando <[email protected]> > wrote: > >> Hi all, >> >> +1 for the idea. IMO using the *from* clause on event tables to do batch >> processing makes sense and fits the current Siddhi model nicely. >> >> For supporting the *from* clause when the timestamp is not given, can we >> add a timestamp internally to the table if the event table definition does >> not contain a timestamp attribute? >> >> i.e. >> >> - If the event table definition contains a timestamp attribute, that will >> be used (If not, we let the user specify what should be picked up as the >> timestamp attribute) >> > Yes, all BAM data has timestamp. But user can override with an other > column if he likes. > >> - If the timestamp is not available, and not specified either, we add a >> timestamp internally. >> > - At the time of inserting the event, we check whether the timestamp is >> there and if not add the timestamp at the time of inserting the event to >> the table. >> >> The timestamp attribute will have to be treated a bit specially. >> > One option is when there is no timestamp, all the data matches. (reverting > to what we have now in event tables). e.g. "from EventTable ... " is > allowed that matches all the data, but "from EventTable#window.batch(5m)" > will throw an error as there is no timestamp. > >> >> I believe that this approach does have scenarios/use-cases where the >> behaviour would be not as expected. (e.g. if timestamp is null or corrupted >> for only a selected set of events; also setting the timestamp of the event >> to the time at which it was received by the server does not make sense at >> all times). Just throwing out the idea to see if it is feasible.. :-) >> > That users can handle by giving his own time stamp feild. > If timestamp is null or corrupted, we need to log an error IMO. > >> >> Thanks, >> Lasantha >> >> On 22 November 2014 at 08:00, Srinath Perera <[email protected]> wrote: >> >>> IMO, use of window batch is consistant (My argument is, it is >>> interpreted as batch process when used with event tables). But lets think >>> more on that. >>> >>> --Srinath >>> >>> On Sat, Nov 22, 2014 at 7:56 AM, Sriskandarajah Suhothayan < >>> [email protected]> wrote: >>> >>>> +1 >>>> I like the idea, only worried about the use of *#window.batch()* >>>> How about using *WeatherStream#pull#window.time(24h) *instead? >>>> >>>> Suho >>>> >>>> On Fri, Nov 21, 2014 at 8:33 PM, Srinath Perera <[email protected]> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> >>>>> I think I have finally figured out a way do the Paul's idea (See the >>>>> thread "Unifying CEP and BAM Languages" >>>>> http://mail.wso2.org/mailarchive/architecture/2014-May/016366.html >>>>> for more details). >>>>> >>>>> >>>>> Currently event tables in Siddhi are backed by a disk and can be used >>>>> to joins >>>>> >>>>> However, you must use it with a stream and it is triggered from a >>>>> stream. For example currently *from WeatherStream insert into ..* is >>>>> not defined. *Idea is to use that for batch processing!* >>>>> >>>>> Proposal is to extend event tables definition >>>>> >>>>> 1. Add an timestamp field to event tables (Then BAM stream also >>>>> become a event table) >>>>> >>>>> 2. *Define “from” operation on event table to cover batch processing*. >>>>> For example >>>>> >>>>> from EvetTable#window.batch(2h) avg(pressure) as avgPressure will run >>>>> as batch job with 2h data using data in the event table. (We will execute >>>>> this in Spark using Siddhi engine). If #window.batch(2h) is omitted, >>>>> #window.batch(5m) is assumed. >>>>> >>>>> 3. You can define an event table on top of DB, disk, Cassandra, hdfs, >>>>> BAM stream stored etc .. >>>>> >>>>> Let me take an example >>>>> >>>>> Say you want to calculate avg and stddev of pressure once every 24h as >>>>> a batch job, and raise an alarm if current pressure is less than 3 stddev >>>>> from the mean calculated in batch process. (this is a extended Lambda >>>>> architecture scenario) Then you can do all this with Siddhi, and query >>>>> will >>>>> look like following. >>>>> >>>>> *define eventTable WeatherTable using BAMStream … * >>>>> >>>>> *define eventTable WeatherStatsTable using BAMStream … * >>>>> >>>>> *//batch job* >>>>> >>>>> *from **WeatherTable**#window.batch(24h) stddev(pressure) , avg >>>>> (pressure)* >>>>> >>>>> * insert into **WeatherStatsTable**; * >>>>> >>>>> *//use results from batch job in realtime * >>>>> >>>>> *from WeatherStream as p join **WeatherStatsTable**#window.last() as >>>>> s * >>>>> >>>>> * on pressure < s.mean -2*s.stddev* >>>>> >>>>> * insert into WeatherAlarmStream; * >>>>> >>>>> First query runs once every 24 hours and calculate mean and stddev and >>>>> write to disk. That value is joined against the live stream as they come >>>>> in >>>>> via join. >>>>> >>>>> Few more rules (and there are more details that we need to figure out) >>>>> >>>>> >>>>> 1. You read from event table, then it runs as batch processes >>>>> >>>>> 2. If you join with event table, it works as now. However, you can >>>>> define a window when joining on top of event tables as well. e.g. >>>>> WeatherStats#window.batch(5m) means takes events came in last 5 mins. >>>>> >>>>> 3. We need to define how it behaves when timestamp field is not >>>>> defined. Best is to only support *joins* and not support *from* in >>>>> that case. >>>>> >>>>> 4. When processing batches, it runs in parallel if partitions are >>>>> defined. For example, if you want to calculate mean in map reduce style, >>>>> it >>>>> will look like following. >>>>> >>>>> *define partition StationParition WeatherStream.stationID; * >>>>> >>>>> *from WeatherStream#window.batch(24h) avg(pressure) * >>>>> >>>>> * insert into WeatherMeanStream using StationParition; * >>>>> >>>>> If no partitions, it will run sequentially (We can improve on this >>>>> later). >>>>> >>>>> For execution, we want to init siddhi within Spark, and run thing in >>>>> parallel using spark, but actual evaluation will done by Siddhi. >>>>> >>>>> 5. Users can define other windows like sliding windows with event >>>>> tables. However, Siddhi will read data from disk once every 5 minutes or >>>>> so. So results will be the same, however, it might come bit later than >>>>> with >>>>> streams. >>>>> >>>>> >>>>> Please comment >>>>> >>>>> Thanks >>>>> >>>>> Srinath >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ============================ >>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>> Site: http://people.apache.org/~hemapani/ >>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>> Phone: 0772360902 >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *S. Suhothayan* >>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>> *WSO2 Inc. *http://wso2.com >>>> * <http://wso2.com/>* >>>> lean . enterprise . middleware >>>> >>>> >>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: >>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>>> http://lk.linkedin.com/in/suhothayan >>>> <http://lk.linkedin.com/in/suhothayan>* >>>> >>> >>> >>> >>> -- >>> ============================ >>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>> Site: http://people.apache.org/~hemapani/ >>> Photos: http://www.flickr.com/photos/hemapani/ >>> Phone: 0772360902 >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> *Lasantha Fernando* >> Software Engineer - Data Technologies Team >> WSO2 Inc. http://wso2.com >> >> email: [email protected] >> mobile: (+94) 71 5247551 >> > > > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
