Hi Srinath,

Please find the comments inline.

On Thu, Mar 24, 2016 at 11:39 AM, Srinath Perera <[email protected]> wrote:

> Hi Sachith, Anjana,
>
> +1 for the backend model.
>
> Are we handling the case when last run was done, 25 minutes of data is
> processed. Basically, next run has to re-run last hour and update the value.
>

Yes. It will recalculate for that hour and will update the value.


>
> When does one hour counting starts? is it from the moment server starts?
> That will be probabilistic when you restart. I think we need to either
> start with know place ( midnight) or let user configure it.
>

In the first run all the data available are processed.
After that it calculates the floor of last processed events' timestamp and
gets the floor value (timestamp - timestamp%3600), that would be used as
the start of the time windows.

>
> I am bit concern about the syntax though. This only works for very
> specific type of queries ( that includes aggregate and a group by). What
> happen if user do this with a different query? Can we give clear error
> message?
>

Currently the error messages are very generic. We will have to work on it
to improve those messages.

Thanks,
Sachith


>
> --Srinath
>
> On Mon, Mar 21, 2016 at 5:15 PM, Sachith Withana <[email protected]> wrote:
>
>> Hi all,
>>
>> We are adding incremental processing capability to Spark in DAS.
>> As the first stage, we added time slicing to Spark execution.
>>
>> Here's a quick introduction into that.
>>
>> *Execution*:
>>
>> In the first run of the script, it will process all the data in the given
>> table and store the last processed event timestamp.
>> Then from the next run onwards it would start processing starting from
>> that stored timestamp.
>>
>> Until the query that contains the data processing part, completes, last
>> processed event timestamp would not be overridden with the new value.
>> This is to ensure that the data processing for the query wouldn't have to
>> be done again, if the whole query fails.
>> This is ensured by adding a commit query after the main query.
>> Refer to the Syntax section for the example.
>>
>> *Syntax*:
>>
>> In the spark script, incremental processing support has to be specified
>> per table, this would happen in the create temporary table line.
>>
>> ex: CREATE TEMPORARY TABLE T1 USING CarbonAnalytics options (tableName
>> "test",
>> *incrementalProcessing "T1,3600");*
>>
>> INSERT INTO T2 SELECT username, age GROUP BY age FROM T1;
>>
>> INC_TABLE_COMMIT T1;
>>
>> The last line is where it ensures the processing took place successfully
>> and replaces the lastProcessed timestamp with the new one.
>>
>> *TimeWindow*:
>>
>> To do the incremental processing, the user has to provide the time window
>> per which the data would be processed.
>> In the above example. the data would be summarized in *1 hour *time
>> windows.
>>
>> *WindowOffset*:
>>
>> Events might arrive late that would belong to a previous processed time
>> window. To account to that, we have added an optional parameter that would
>> allow to
>> process immediately previous time windows as well ( acts like a buffer).
>> ex: If this is set to 1, apart from the to-be-processed data, data
>> related to the previously processed time window will also be taken for
>> processing.
>>
>>
>> *Limitations*:
>>
>> Currently, multiple time windows cannot be specified per temporary table
>> in the same script.
>> It would have to be done using different temporary tables.
>>
>>
>>
>> *Future Improvements:*
>> - Add aggregation function support for incremental processing
>>
>>
>> Thanks,
>> Sachith
>> --
>> Sachith Withana
>> Software Engineer; WSO2 Inc.; http://wso2.com
>> E-mail: sachith AT wso2.com
>> M: +94715518127
>> Linked-In: <http://goog_416592669>
>> https://lk.linkedin.com/in/sachithwithana
>>
>
>
>
> --
> ============================
> Srinath Perera, Ph.D.
>    http://people.apache.org/~hemapani/
>    http://srinathsview.blogspot.com/
>



-- 
Sachith Withana
Software Engineer; WSO2 Inc.; http://wso2.com
E-mail: sachith AT wso2.com
M: +94715518127
Linked-In: <http://goog_416592669>https://lk.linkedin.com/in/sachithwithana
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to