Multi-sink is something I’ve wanted to for a while. (I know that hive uses 
multi-sink plans for insert, but has never been able to model them using 
Calcite.)

We basically need a DAG. The problems are how to model divergent data flows, 
and how to model the “controller” that waits for all of the sinks to finish. 

The Spool operator might be a good way to model the fact that there are 
multiple consumers of the source scan. 

As for the controller: how about a Union, say “select count(*) from sink1 union 
all select count(*) from sink2”. (Strictly, you don’t need to count, but you 
need to wait until each sink has completed, and you need the row-types to be 
union-compatible, so Union is pretty good.)

I look forward to seeing some optimization rules on Spool. E.g. project away 
columns that none of the consumers need, similarly filters. 

Julian

> On Jun 5, 2019, at 5:12 AM, Yuzhao Chen <[email protected]> wrote:
> 
> This seems a requests for multi-sink insert.
> 
>> 3) Calcite transforms it into multiple TableModifies
> 
> Instead of let Calcite to transform multiple TableModifies, I think you 
> should do it by your self, the send each TableModify to Calcite sqlToRel 
> converter.
> 
> If you want to insert into multiple sink task to be run in the same plan, 
> this is another topic, we may promote one sink node tree a time and finally 
> merge all the trees.
> 
> Best,
> Danny Chan
> 在 2019年6月5日 +0800 PM7:58,[email protected],写道:
>> 
>> 3) Calcite transforms it into multiple TableModifies

Reply via email to