Hi all,

We are adding incremental processing capability to Spark in DAS.
As the first stage, we added time slicing to Spark execution.

Here's a quick introduction into that.

*Execution*:

In the first run of the script, it will process all the data in the given
table and store the last processed event timestamp.
Then from the next run onwards it would start processing starting from that
stored timestamp.

Until the query that contains the data processing part, completes, last
processed event timestamp would not be overridden with the new value.
This is to ensure that the data processing for the query wouldn't have to
be done again, if the whole query fails.
This is ensured by adding a commit query after the main query.
Refer to the Syntax section for the example.

*Syntax*:

In the spark script, incremental processing support has to be specified per
table, this would happen in the create temporary table line.

ex: CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options (tableName
"test",
*incrementalProcessing "T1,3600");*

INSERT INTO T2 SELECT username, age GROUP BY age FROM T1;

INC_TABLE_COMMIT T1;

The last line is where it ensures the processing took place successfully
and replaces the lastProcessed timestamp with the new one.

*TimeWindow*:

To do the incremental processing, the user has to provide the time window
per which the data would be processed.
In the above example. the data would be summarized in *1 hour *time
windows.

*WindowOffset*:

Events might arrive late that would belong to a previous processed time
window. To account to that, we have added an optional parameter that would
allow to
process immediately previous time windows as well ( acts like a buffer).
ex: If this is set to 1, apart from the to-be-processed data, data related
to the previously processed time window will also be taken for processing.


*Limitations*:

Currently, multiple time windows cannot be specified per temporary table in
the same script.
It would have to be done using different temporary tables.



*Future Improvements:*
- Add aggregation function support for incremental processing


Thanks,
Sachith
-- 
Sachith Withana
Software Engineer; WSO2 Inc.; http://wso2.com
E-mail: sachith AT wso2.com
M: +94715518127
Linked-In: <http://goog_416592669>https://lk.linkedin.com/in/sachithwithana
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to