Hi Srinath, Please find the comments inline.
On Thu, Mar 24, 2016 at 11:39 AM, Srinath Perera <[email protected]> wrote: > Hi Sachith, Anjana, > > +1 for the backend model. > > Are we handling the case when last run was done, 25 minutes of data is > processed. Basically, next run has to re-run last hour and update the value. > Yes. It will recalculate for that hour and will update the value. > > When does one hour counting starts? is it from the moment server starts? > That will be probabilistic when you restart. I think we need to either > start with know place ( midnight) or let user configure it. > In the first run all the data available are processed. After that it calculates the floor of last processed events' timestamp and gets the floor value (timestamp - timestamp%3600), that would be used as the start of the time windows. > > I am bit concern about the syntax though. This only works for very > specific type of queries ( that includes aggregate and a group by). What > happen if user do this with a different query? Can we give clear error > message? > Currently the error messages are very generic. We will have to work on it to improve those messages. Thanks, Sachith > > --Srinath > > On Mon, Mar 21, 2016 at 5:15 PM, Sachith Withana <[email protected]> wrote: > >> Hi all, >> >> We are adding incremental processing capability to Spark in DAS. >> As the first stage, we added time slicing to Spark execution. >> >> Here's a quick introduction into that. >> >> *Execution*: >> >> In the first run of the script, it will process all the data in the given >> table and store the last processed event timestamp. >> Then from the next run onwards it would start processing starting from >> that stored timestamp. >> >> Until the query that contains the data processing part, completes, last >> processed event timestamp would not be overridden with the new value. >> This is to ensure that the data processing for the query wouldn't have to >> be done again, if the whole query fails. >> This is ensured by adding a commit query after the main query. >> Refer to the Syntax section for the example. >> >> *Syntax*: >> >> In the spark script, incremental processing support has to be specified >> per table, this would happen in the create temporary table line. >> >> ex: CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options (tableName >> "test", >> *incrementalProcessing "T1,3600");* >> >> INSERT INTO T2 SELECT username, age GROUP BY age FROM T1; >> >> INC_TABLE_COMMIT T1; >> >> The last line is where it ensures the processing took place successfully >> and replaces the lastProcessed timestamp with the new one. >> >> *TimeWindow*: >> >> To do the incremental processing, the user has to provide the time window >> per which the data would be processed. >> In the above example. the data would be summarized in *1 hour *time >> windows. >> >> *WindowOffset*: >> >> Events might arrive late that would belong to a previous processed time >> window. To account to that, we have added an optional parameter that would >> allow to >> process immediately previous time windows as well ( acts like a buffer). >> ex: If this is set to 1, apart from the to-be-processed data, data >> related to the previously processed time window will also be taken for >> processing. >> >> >> *Limitations*: >> >> Currently, multiple time windows cannot be specified per temporary table >> in the same script. >> It would have to be done using different temporary tables. >> >> >> >> *Future Improvements:* >> - Add aggregation function support for incremental processing >> >> >> Thanks, >> Sachith >> -- >> Sachith Withana >> Software Engineer; WSO2 Inc.; http://wso2.com >> E-mail: sachith AT wso2.com >> M: +94715518127 >> Linked-In: <http://goog_416592669> >> https://lk.linkedin.com/in/sachithwithana >> > > > > -- > ============================ > Srinath Perera, Ph.D. > http://people.apache.org/~hemapani/ > http://srinathsview.blogspot.com/ > -- Sachith Withana Software Engineer; WSO2 Inc.; http://wso2.com E-mail: sachith AT wso2.com M: +94715518127 Linked-In: <http://goog_416592669>https://lk.linkedin.com/in/sachithwithana
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
