[ 
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228898#comment-14228898
 ] 

Milinda Lakmal Pathirage commented on SAMZA-390:
------------------------------------------------

Another interesting paper I found was "Query Languages and Data Models for 
Database Sequences and Data Streams" [1] which propose a different way of 
handling window queries over streams using 'user defined aggregates(UDA)'. They 
introduce notion of nonblocking(NB) queries and NB-completeness first. They 
also show that relational algebra is not NB-complete (its well know that we 
can't support ALL, EXCEPT, NOT IN like blocking operations over stream without 
window operator). Instead of using window operator like 'S [Rows 5]', they 
proposed to use UDA like following to do window computations.

AGGREGATE tumble avg(Next Int) : Real 
{ 
    TABLE state(tsum Int, cnt Int);
        INITIALIZE : {
            INSERT INTO state VALUES (Next, 1)
        }
        ITERATE: {
            UPDATE state
                SET tsum=tsum+Next, cnt=cnt+1; 
            INSERT INTO RETURN
                SELECT tsum/cnt FROM state
                WHERE cnt % 200 = 0;
           UPDATE state SET tsum=0, cnt=0
               WHERE cnt % 200 = 0 
        }
       TERMINATE : { }
 }

Emitting tuples to down stream is done by 'INSERT INTO RETUEN'. If you have 
'INSERT INTO RETURN' in TERMINATE block, your aggregate is blocking and cannot 
executed over a stream. There are some interesting samples like finding 
patterns over a stream in Section 5 of the paper [1]. They even show a 
implementation of a turing machine using UDAs. Also they use 'union' and UDAs 
to implement stream joins instead of blocking join operator. Sample can be 
found in [2]. 

Why I was interested about this paper is mainly because
 - It looks like we can even do pattern matching type of queries over streams 
using UDAs. I am not sure how complicated this using general SQL
 - It looks like we can use this as the intermediate model where other 
languages, DSLs, APIs transformed into. I am yet to understand how well this 
will work. But concept of UDA seems pretty interesting to me given the fact 
that we can even model a turing machine.

I found several other references in this paper which explains/motivated some of 
the concepts here. I'll let you know if I find any interesting things in those.
 

[1] http://www.cs.ucla.edu/~zaniolo/papers/vldb04cr.pdf
[2] http://wis.cs.ucla.edu/wis/stream-mill/examples/nexmark.html

> High-Level Language for Samza
> -----------------------------
>
>                 Key: SAMZA-390
>                 URL: https://issues.apache.org/jira/browse/SAMZA-390
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Raul Castro Fernandez
>            Priority: Minor
>              Labels: project
>
> Discussion about high-level languages to define Samza queries. Queries are 
> defined in this language and transformed to a dataflow graph where the nodes 
> are Samza jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to