Multi-sink is something I’ve wanted to for a while. (I know that hive uses multi-sink plans for insert, but has never been able to model them using Calcite.)
We basically need a DAG. The problems are how to model divergent data flows, and how to model the “controller” that waits for all of the sinks to finish. The Spool operator might be a good way to model the fact that there are multiple consumers of the source scan. As for the controller: how about a Union, say “select count(*) from sink1 union all select count(*) from sink2”. (Strictly, you don’t need to count, but you need to wait until each sink has completed, and you need the row-types to be union-compatible, so Union is pretty good.) I look forward to seeing some optimization rules on Spool. E.g. project away columns that none of the consumers need, similarly filters. Julian > On Jun 5, 2019, at 5:12 AM, Yuzhao Chen <[email protected]> wrote: > > This seems a requests for multi-sink insert. > >> 3) Calcite transforms it into multiple TableModifies > > Instead of let Calcite to transform multiple TableModifies, I think you > should do it by your self, the send each TableModify to Calcite sqlToRel > converter. > > If you want to insert into multiple sink task to be run in the same plan, > this is another topic, we may promote one sink node tree a time and finally > merge all the trees. > > Best, > Danny Chan > 在 2019年6月5日 +0800 PM7:58,[email protected],写道: >> >> 3) Calcite transforms it into multiple TableModifies
