[
https://issues.apache.org/jira/browse/CALCITE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170512#comment-17170512
]
liupengcheng edited comment on CALCITE-4146 at 8/4/20, 3:04 AM:
----------------------------------------------------------------
>hmmm the EMIT will be at end of SQL query and it will propagate through every
>relational operator. However, technically, I feel like >only
>TableFuncitonScanRel that works with a stream will be affected. You can think
>of that a macro batch streaming system >implements it. In that system, EMTI
>controls the size and frequency of each macro batch (from
>TableFuncitonScanRel), and all >other join, sort, aggregate are applied on
>each macro batch.
Hi, [~amaliujia],
Yes, I know this worked like a micro batch, but in real implementation for
streams, some operators upon a TableFunctionScanRel should know where is end of
the micro batch and when it should perform the calculation and EMIT data to
it's upstream(e.g. two stream join of windowTableFunctionScanRel), but others
may not(e.g., map like operations). So my doubt is that will the EMIT syntax
affect both the TableFunctionScanRel and all it's downstream operators?
E.g. The following example:
```
select * from windowTableFunctionScanRel1 t1 join windowTableFunctionScanRel2
t2 on t1.id = t2.id EMIT after watermark;
```
There are probably two implementations:
1. The EMIT is only bind to the window join and the windowTableFunctionScanRel
just worked like a map(append some extra window attributes)?
2. The EMIT is bind to windowTableFunctionScanRel and all the downstream
operators of windowTableFunctionScanRel . So The windowTableFunctionScanRel
will buffer data and EMIT as the specified strategy(e.g. every 1 minute), and
so do the window join.
which one is preferred? and if there are some other filter or map operations,
should these operators act as the EMIT strategy?
was (Author: liupengcheng):
>hmmm the EMIT will be at end of SQL query and it will propagate through every
>relational operator. However, technically, I feel like >only
>TableFuncitonScanRel that works with a stream will be affected. You can think
>of that a macro batch streaming system >implements it. In that system, EMTI
>controls the size and frequency of each macro batch (from
>TableFuncitonScanRel), and all >other join, sort, aggregate are applied on
>each macro batch.
Hi, [~amaliujia],
Yes, I know this worked like a micro batch, but in real implementation for
streams, some operators upon a TableFunctionScanRel should know where is end of
the micro batch and when it should perform the calculation and EMIT data to
it's upstream(e.g. two stream join of windowTableFunctionScanRel), but others
may not(e.g., map like operations). So my doubt is that will the EMIT syntax
affect both the TableFunctionScanRel and all it's downstream operators?
E.g. The following example:
```
select * from windowTableFunctionScanRel1 t1 join windowTableFunctionScanRel2
t2 on t1.id = t2.id EMIT after watermark;
```
There are probably two implementations:
1. The EMIT is only bind to the window join and the windowTableFunctionScanRel
just worked like a map(append some extra window attributes)?
2. The EMIT is bind to windowTableFunctionScanRel and all the downstream
operators of windowTableFunctionScanRel . So The windowTableFunctionScanRel
will buffer data and EMIT as the specified strategy(e.g. every 1 minute), and
so do the window join.
which one is preferred?
> Implement EMIT Syntax
> ---------------------
>
> Key: CALCITE-4146
> URL: https://issues.apache.org/jira/browse/CALCITE-4146
> Project: Calcite
> Issue Type: New Feature
> Reporter: Rui Wang
> Assignee: Rui Wang
> Priority: Major
>
> The goal is to support the following syntax:
> {code:sql}
> SELECT clause
> FROM TUMBLE/HOP/SESSION
> [EMIT]
> {code}
> EMIT Syntax is proposed in [One SQL to Rule Them
> All|https://arxiv.org/pdf/1905.12133.pdf]. This idea proposes a way to allow
> streaming SQL queries control materialization latency.
> Regarding the types of emit strategies, due to limit pages, that paper only
> lists two strategies, and Calcite should support at least four categories:
> 1. Event time triggers. Emitting depends on the relationship between
> watermark and event timestamp of events. Handling late data is also included
> in this category.
> 2. Processing time triggers. Emitting depends on the system clock. This is
> a natural idea of emitting. E.g. emit the current result every hour without
> considering if data in a window is already complete.
> 3. data-driven triggers. E.g. emit when accumulated events exceed a
> threshold (e.g. emit when have acculucated 1000 events)
> 4. Composite triggers. There is a need to concat 1, 2, 3 by OR and AND to
> achieve better latency control.
> There are more context discussed in
> [CALCITE-3272|https://issues.apache.org/jira/browse/CALCITE-3272?focusedCommentId=17166580&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17166580]
> and the [EMIT syntax proposal for event-timestamp semantic
> windowing|https://lists.apache.org/thread.html/r5bd9a6f7af2c0cd81aecd4de512fd889fbf15f112cc3704f188b1d4f%40%3Cdev.calcite.apache.org%3E]
> email thread.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)