[
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313238#comment-14313238
]
Yi Pan (Data Infrastructure) commented on SAMZA-390:
----------------------------------------------------
We had a discussion on some remaining SQL language related issue last Friday
and here is my summary:
# Support for PARTITION
## Samza needs to know PARTITION key and count passed down by the SQL
parser/planner
## PARTITION key can be added as an extension to SQL in Calcite. If missing,
Samza will choose random partition
## PARTITION count is a system property and should not be enforced in SQL
grammar. There are three cases we need to handle
### topic already exists in Kafka. Samza will only need to read it from Kafka
metadata.
### topic does not exist and we allow auto-creation of topic. Samza will
auto-create the topic w/ default partition count
### topic does not exist and auto-creation is not allowed. It will require the
user to perform an admin op to create the topics first. Then, Samza can get it
the PARTITION count from Kafka
# Schema and Metadata support
## Schema definition and DDL
### We have decided that metadata registry to store schema definition from DDL
is optional. The impact is whether we can do a compile time validation or
runtime validation: compile time validation is possible when schema metadata is
supplied.
### Two examples: with Avro schema registry, we can implement an schema
metadata interface s.t. Calcite validation module can be applied to perform
compile time validation; while with JSON, the validation would be skipped and
we opt to get runtime validation errors.
## Tuple schema
### If we defines tuple schema in a stream, should we support multiple schemas
in a single stream? There seems to be possible use non-SQL cases for multiple
schemas in a single stream, e.g. a split a stream to multiple according to
different schema. It seems to be reasonable to ask the Samza physical operator
to support multiple schemas in a single stream (i.e. schema is associated w/
tuple) while no SQL language support is needed. The feature can potentially
used by other DSL languages that may implement m-schemas in a single stream.
# Window syntax and semantics
## How much syntax support we need from SQL language? I opened a ticket to
track that: SAMZA-551
## Tuple vs Time based window. I opened a ticket to track that as well:
SAMZA-552
> High-Level Language for Samza
> -----------------------------
>
> Key: SAMZA-390
> URL: https://issues.apache.org/jira/browse/SAMZA-390
> Project: Samza
> Issue Type: New Feature
> Components: sql
> Reporter: Raul Castro Fernandez
> Priority: Minor
> Labels: project
> Attachments: StreamSQLforSAMZA-v0.1.docx.docx
>
>
> Discussion about high-level languages to define Samza queries. Queries are
> defined in this language and transformed to a dataflow graph where the nodes
> are Samza jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)