[
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299918#comment-14299918
]
Jay Kreps commented on SAMZA-390:
---------------------------------
Read a little bit on the plane. I found a few comparisons of CQL and StreamSQL
that were helpful. Key difference is the "tuple-driven" vs "time-driven"
distinction. Personally I thought tuple driven is a much closer fit to the
underlying Kafka concepts (an ordered stream of tuples).
Some links:
Basic overview of CQL:
http://www.it.uu.se/research/group/udbl/Theses/RobertKajicBSc.pdf
This paper dives into the tuple/time distinction and proposes a fix:
http://cs.brown.edu/~ugur/streamsql.pdf
I also think the heartbeat approach that the CQL people take
(http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=78B3CFA375CA62DD600C8A7705D17FD8?doi=10.1.1.90.1199&rep=rep1&type=pdf)
actually doesn't work well in a modern environment which is geographically
distributed. Just because the chicago datacenter can't heartbeat to the
processing cluster doesn't mean it isn't recording data. I think in practice
you have to model the concept of late data directly.
> High-Level Language for Samza
> -----------------------------
>
> Key: SAMZA-390
> URL: https://issues.apache.org/jira/browse/SAMZA-390
> Project: Samza
> Issue Type: New Feature
> Components: sql
> Reporter: Raul Castro Fernandez
> Priority: Minor
> Labels: project
> Attachments: StreamSQLforSAMZA-v0.1.docx.docx
>
>
> Discussion about high-level languages to define Samza queries. Queries are
> defined in this language and transformed to a dataflow graph where the nodes
> are Samza jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)