On Sat, Nov 22, 2014 at 10:34 PM, Lasantha Fernando <[email protected]> wrote:
> Hi all, > > +1 for the idea. IMO using the *from* clause on event tables to do batch > processing makes sense and fits the current Siddhi model nicely. > > For supporting the *from* clause when the timestamp is not given, can we > add a timestamp internally to the table if the event table definition does > not contain a timestamp attribute? > > i.e. > > - If the event table definition contains a timestamp attribute, that will > be used (If not, we let the user specify what should be picked up as the > timestamp attribute) > Yes, all BAM data has timestamp. But user can override with an other column if he likes. > - If the timestamp is not available, and not specified either, we add a > timestamp internally. > - At the time of inserting the event, we check whether the timestamp is > there and if not add the timestamp at the time of inserting the event to > the table. > > The timestamp attribute will have to be treated a bit specially. > One option is when there is no timestamp, all the data matches. (reverting to what we have now in event tables). e.g. "from EventTable ... " is allowed that matches all the data, but "from EventTable#window.batch(5m)" will throw an error as there is no timestamp. > > I believe that this approach does have scenarios/use-cases where the > behaviour would be not as expected. (e.g. if timestamp is null or corrupted > for only a selected set of events; also setting the timestamp of the event > to the time at which it was received by the server does not make sense at > all times). Just throwing out the idea to see if it is feasible.. :-) > That users can handle by giving his own time stamp feild. If timestamp is null or corrupted, we need to log an error IMO. > > Thanks, > Lasantha > > On 22 November 2014 at 08:00, Srinath Perera <[email protected]> wrote: > >> IMO, use of window batch is consistant (My argument is, it is interpreted >> as batch process when used with event tables). But lets think more on that. >> >> --Srinath >> >> On Sat, Nov 22, 2014 at 7:56 AM, Sriskandarajah Suhothayan <[email protected] >> > wrote: >> >>> +1 >>> I like the idea, only worried about the use of *#window.batch()* >>> How about using *WeatherStream#pull#window.time(24h) *instead? >>> >>> Suho >>> >>> On Fri, Nov 21, 2014 at 8:33 PM, Srinath Perera <[email protected]> >>> wrote: >>> >>>> Hi All, >>>> >>>> >>>> I think I have finally figured out a way do the Paul's idea (See the >>>> thread "Unifying CEP and BAM Languages" >>>> http://mail.wso2.org/mailarchive/architecture/2014-May/016366.html for >>>> more details). >>>> >>>> >>>> Currently event tables in Siddhi are backed by a disk and can be used >>>> to joins >>>> >>>> However, you must use it with a stream and it is triggered from a >>>> stream. For example currently *from WeatherStream insert into ..* is >>>> not defined. *Idea is to use that for batch processing!* >>>> >>>> Proposal is to extend event tables definition >>>> >>>> 1. Add an timestamp field to event tables (Then BAM stream also become >>>> a event table) >>>> >>>> 2. *Define “from” operation on event table to cover batch processing*. >>>> For example >>>> >>>> from EvetTable#window.batch(2h) avg(pressure) as avgPressure will run >>>> as batch job with 2h data using data in the event table. (We will execute >>>> this in Spark using Siddhi engine). If #window.batch(2h) is omitted, >>>> #window.batch(5m) is assumed. >>>> >>>> 3. You can define an event table on top of DB, disk, Cassandra, hdfs, >>>> BAM stream stored etc .. >>>> >>>> Let me take an example >>>> >>>> Say you want to calculate avg and stddev of pressure once every 24h as >>>> a batch job, and raise an alarm if current pressure is less than 3 stddev >>>> from the mean calculated in batch process. (this is a extended Lambda >>>> architecture scenario) Then you can do all this with Siddhi, and query will >>>> look like following. >>>> >>>> *define eventTable WeatherTable using BAMStream … * >>>> >>>> *define eventTable WeatherStatsTable using BAMStream … * >>>> >>>> *//batch job* >>>> >>>> *from **WeatherTable**#window.batch(24h) stddev(pressure) , avg >>>> (pressure)* >>>> >>>> * insert into **WeatherStatsTable**; * >>>> >>>> *//use results from batch job in realtime * >>>> >>>> *from WeatherStream as p join **WeatherStatsTable**#window.last() as >>>> s * >>>> >>>> * on pressure < s.mean -2*s.stddev* >>>> >>>> * insert into WeatherAlarmStream; * >>>> >>>> First query runs once every 24 hours and calculate mean and stddev and >>>> write to disk. That value is joined against the live stream as they come in >>>> via join. >>>> >>>> Few more rules (and there are more details that we need to figure out) >>>> >>>> 1. You read from event table, then it runs as batch processes >>>> >>>> 2. If you join with event table, it works as now. However, you can >>>> define a window when joining on top of event tables as well. e.g. >>>> WeatherStats#window.batch(5m) means takes events came in last 5 mins. >>>> >>>> 3. We need to define how it behaves when timestamp field is not >>>> defined. Best is to only support *joins* and not support *from* in >>>> that case. >>>> >>>> 4. When processing batches, it runs in parallel if partitions are >>>> defined. For example, if you want to calculate mean in map reduce style, it >>>> will look like following. >>>> >>>> *define partition StationParition WeatherStream.stationID; * >>>> >>>> *from WeatherStream#window.batch(24h) avg(pressure) * >>>> >>>> * insert into WeatherMeanStream using StationParition; * >>>> >>>> If no partitions, it will run sequentially (We can improve on this >>>> later). >>>> >>>> For execution, we want to init siddhi within Spark, and run thing in >>>> parallel using spark, but actual evaluation will done by Siddhi. >>>> >>>> 5. Users can define other windows like sliding windows with event >>>> tables. However, Siddhi will read data from disk once every 5 minutes or >>>> so. So results will be the same, however, it might come bit later than with >>>> streams. >>>> >>>> >>>> Please comment >>>> >>>> Thanks >>>> >>>> Srinath >>>> >>>> >>>> >>>> >>>> -- >>>> ============================ >>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>> Site: http://people.apache.org/~hemapani/ >>>> Photos: http://www.flickr.com/photos/hemapani/ >>>> Phone: 0772360902 >>>> >>> >>> >>> >>> -- >>> >>> *S. Suhothayan* >>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>> *WSO2 Inc. *http://wso2.com >>> * <http://wso2.com/>* >>> lean . enterprise . middleware >>> >>> >>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: >>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>* >>> >> >> >> >> -- >> ============================ >> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >> Site: http://people.apache.org/~hemapani/ >> Photos: http://www.flickr.com/photos/hemapani/ >> Phone: 0772360902 >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > *Lasantha Fernando* > Software Engineer - Data Technologies Team > WSO2 Inc. http://wso2.com > > email: [email protected] > mobile: (+94) 71 5247551 > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
