[ 
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299918#comment-14299918
 ] 

Jay Kreps commented on SAMZA-390:
---------------------------------

Read a little bit on the plane. I found a few comparisons of CQL and StreamSQL 
that were helpful. Key difference is the "tuple-driven" vs "time-driven" 
distinction. Personally I thought tuple driven is a much closer fit to the 
underlying Kafka concepts (an ordered stream of tuples).

Some links:
Basic overview of CQL:
http://www.it.uu.se/research/group/udbl/Theses/RobertKajicBSc.pdf
This paper dives into the tuple/time distinction and proposes a fix:
http://cs.brown.edu/~ugur/streamsql.pdf

I also think the heartbeat approach that the CQL people take 
(http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=78B3CFA375CA62DD600C8A7705D17FD8?doi=10.1.1.90.1199&rep=rep1&type=pdf)
 actually doesn't work well in a modern environment which is geographically 
distributed. Just because the chicago datacenter can't heartbeat to the 
processing cluster doesn't mean it isn't recording data. I think in practice 
you have to model the concept of late data directly.

> High-Level Language for Samza
> -----------------------------
>
>                 Key: SAMZA-390
>                 URL: https://issues.apache.org/jira/browse/SAMZA-390
>             Project: Samza
>          Issue Type: New Feature
>          Components: sql
>            Reporter: Raul Castro Fernandez
>            Priority: Minor
>              Labels: project
>         Attachments: StreamSQLforSAMZA-v0.1.docx.docx
>
>
> Discussion about high-level languages to define Samza queries. Queries are 
> defined in this language and transformed to a dataflow graph where the nodes 
> are Samza jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to