Hi all, We are adding incremental processing capability to Spark in DAS. As the first stage, we added time slicing to Spark execution.
Here's a quick introduction into that. *Execution*: In the first run of the script, it will process all the data in the given table and store the last processed event timestamp. Then from the next run onwards it would start processing starting from that stored timestamp. Until the query that contains the data processing part, completes, last processed event timestamp would not be overridden with the new value. This is to ensure that the data processing for the query wouldn't have to be done again, if the whole query fails. This is ensured by adding a commit query after the main query. Refer to the Syntax section for the example. *Syntax*: In the spark script, incremental processing support has to be specified per table, this would happen in the create temporary table line. ex: CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options (tableName "test", *incrementalProcessing "T1,3600");* INSERT INTO T2 SELECT username, age GROUP BY age FROM T1; INC_TABLE_COMMIT T1; The last line is where it ensures the processing took place successfully and replaces the lastProcessed timestamp with the new one. *TimeWindow*: To do the incremental processing, the user has to provide the time window per which the data would be processed. In the above example. the data would be summarized in *1 hour *time windows. *WindowOffset*: Events might arrive late that would belong to a previous processed time window. To account to that, we have added an optional parameter that would allow to process immediately previous time windows as well ( acts like a buffer). ex: If this is set to 1, apart from the to-be-processed data, data related to the previously processed time window will also be taken for processing. *Limitations*: Currently, multiple time windows cannot be specified per temporary table in the same script. It would have to be done using different temporary tables. *Future Improvements:* - Add aggregation function support for incremental processing Thanks, Sachith -- Sachith Withana Software Engineer; WSO2 Inc.; http://wso2.com E-mail: sachith AT wso2.com M: +94715518127 Linked-In: <http://goog_416592669>https://lk.linkedin.com/in/sachithwithana
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
