Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner

Francesco Guardiani Wed, 30 Mar 2022 08:07:06 -0700

Sorry I replied on the wrong thread, i repost my answer here :)

As there was already a discussion in the doc, I'll just summarize my
opinions here on the proposed execution of this FLIP.

I think we should rather avoid exposing internal details, which I consider
Calcite to be part of, but rather reuse what we already have to define an
AST from Table API, which is what I'll refer in this mail as Operation tree.

First of all, the reason I think this FLIP is not a good idea is that it
proposes is to expose types out of our control, so an API we cannot control
and we may realistically never be able to stabilize. A Calcite bump in the
table project is already pretty hard today, as shown by tasks like that
https://github.com/apache/flink/pull/13577. This will make them even
harder. Essentially it will couple us to Calcite even more, and create a
different but still big maintenance/complexity burden we would like to get
rid of with this FLIP.

There are also some technical aspects that seem to me a bit overlooked here:

* What about Scala? Is flink-table-planner-spi going to be a scala module
with the related suffix? Because I see you want to expose a couple of types
which we have implemented with Scala right now, and making this module
Scala dependent makes even more complicated shipping both modules that use
it and flink-table-planner-loader.
* Are you sure exposing the Calcite interfaces is going to be enough? Don't
you also require some instance specific methods? E.g.
FlinkTypeFactory#toLogicalType? What if at some point you need to expose
something like FlinkTypeFactory? How do you plan to support it and
stabilize it in the long term?

Now let me talk a bit about the Operation tree. For who doesn't know what
it is, it's the pure Flink AST for defining DML, used for converting the
Table API DSL to an AST the planner can manipulate. Essentially, it's our
own version of the RelNode tree/RexNode tree. This operation tree can be
used already by Hive, without any API changes on Table side. You just need
a downcast of TableEnvironmentImpl to use getPlanner() and use
Planner#translate, or alternatively you can add getPlanner to
TableEnvironmentInternal directly. From what I've seen about your use case,
and please correct me if I'm wrong, you can implement your SQL -> Operation
tree layer without substantial changes on both sides.

The reason why I think this is a better idea, rather than exposing Calcite
and RelNodes directly, is:

* Aforementioned downsides of exposing Calcite
* It doesn't require a new API to get you started with it
* Doesn't add complexity on planner side, just removes it from the existing
coupling with hive
* Letting another project use the Operation tree will harden it, make it
more stable and eventually lead to become public

The last point in particular is extremely interesting for the future of the
project, as having a stable public Operation tree will allow people to
implement other relational based APIs on top of Flink SQL, or manipulate
the AST to define new semantics, or even more crazy things we can't think
of right now, leading to a broader bigger and more diverse ecosystem. Which
is exactly what Hive is doing right now at the end of the day, define a new
relational API on top of the Flink Table planner functionalities.

On Wed, Mar 30, 2022 at 4:45 PM 罗宇侠(莫辞)
<[email protected]> wrote:

> Hi, I would like to explain a bit more about the current dissusion[1] for
> the ways to decouple Hive connector with Flink Planner.
>
> The background is to support Hive dialect, the Hive connector is dependent
> on Flink Planner for the current implementation is generate RelNode and
> then deliver the RelNode to Flink.
>  But it also brings much complexity and maintenance burden, so we want to
> decouple Hive connector with Flink planner.
>
> There're two ways to do that:
> 1. Make the hive parser just generate an Operation tree like Table API
> currently does.
>
> 2. Introduce a slim module called table-planner-spl which provide Calcite
> dependency and expose limit public intefaces. Then, still generating
> Calcite RelNode, the Hive connector will only require the slim module.
>
> The first way is the ideal way and we should go in that direction. But it
> will take much effort for it requires rewriting all the code about Hive
> dialect and it's hard to do it in one shot.
> And given we want to move out Hive connector in 1.16, it's more pratical
> to decouple first, and then migrate it to operation tree.
> So, the FLIP-216[2] is for the second way. It explains the public
> interfaces to be exposed,  all of which has been implemented by
> PlannerContext.
>
> [1]
> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner
>
> Best regards,
> Yuxia------------------------------------------------------------------
> 发件人：罗宇侠(莫辞)<[email protected]>
> 日 期：2022年03月25日 20:41:15
> 收件人：[email protected]<[email protected]>
> 主 题：[DISCUSS] FLIP-216 Decouple Hive connector with Flink planner
>
> Hi, everyone
>
> I would like to open a discussion about decoupling Hive connector with
> Flink table planner.  It's a follow-up discussion after Hive syntax
> discussion[1], but only focus on how to decouple Hive connector. The origin
> doc is here[2], from which you can see the details work and heated
> discussion about exposing Calcite API or reuse Operation tree to decouple.
> I have created FLIP-216: Decouple Hive connector with Flink planner[3] for
> it.
>
> Thanks and looking forward to a lively discussion!
>
> [1] https://lists.apache.org/thread/2w046dwl46tf2wy750gzmt0qrcz17z8t
> [2]
> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner
>
> Best regards,
> Yuxia
>

Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner

Reply via email to