Hi Martijn, I have discussed this with Yuxia offline and improved the design again.
*Here are the improvements:* 1) all the public APIs are staying in flink-table-common or flink-table-api-java modules, 2) rename "flink-table-planner-spi" to "flink-table-calcite-bridge" which only contains Calcite dependency and internal classes (CalciteContext & CalciteQueryOperation). This module is only used for parser plugins to interact with Calcite APIs. *And here is the migration plan:* 0) expose parser public APIs and introduce the "flink-table-calcite-bridge" 1) make Hive connector planner free (i.e. only depend on "flink-table-calcite-bridge"). ~ 1 week 2) migration RelNode translation to Operation tree one by one. ~ several months, may cross versions 3) drop "flink-table-calcite-bridge" module 4) all the other 3rd-party dialects are not affected. I think this can address your concerns about the final state of APIs, but also allow us to have a smooth migration path to guarantee a high-quality code. What do you think about this? Best, Jark On Wed, 13 Apr 2022 at 22:12, 罗宇侠(莫辞) <luoyuxia.luoyu...@alibaba-inc.com.invalid> wrote: > Hi all, > Sorry for the late reply for this thread. > About decoupling Hive Connector, it is actually mainly for decoupling Hive > dialect. So, I think it's a good timing to introduce pluagble dialect > mechanism for Flink and make Hive dialect as the first. > Based on this point, I have updated the FLIP-216 (Introduce pluggable > dialect and plan for migrate Hive dialect)[1]. > The overview of the FLIP is as follows: > > 1: Introuce a slim module with limited public interfaces for the pluagble > dialect. The public interfaces are in final state and other dialects should > implement the interfaces and follow the style that converts SQL statement > to Flink's Operation Tree. > > 2: Plan for migrating Hive Dialect. For implementing Hive dialect, it's > also expected to convert SQL statement to Flink's Operation Tree. But > unfortunately, the current implementation is convert SQL statment to > Calcite RelNode. It's hard to migrate it to Operation tree at one shot. > It'll be better to migrate it step by step. So, the first step is to stll > keep Calcite dependency and introduce an internal interface called > CalciteContext to create Calcite RelNode, then we can decouple to > flink-table-planner. As a result, we can move Hive connector out from Flink > repository. The second step is to migrate it to Operation Tree, so that we > can drop the Calcite dependency. > > For some questiones from previous emails: > >> What about Scala? Is flink-table-planner-spi going to be a scala module > with the related suffix? > flink-table-planner-spi is going to be a scala free moudle, for Hive > dialect, it's fine not to expose a couple of types > which implemented with Scala. > >> Are you sure exposing the Calcite interfaces is going to be enough? > I'm quite sure. As for the specific method like > FlinkTypeFactory#toLogicalType, I think we think we can implement a simliar > type converter in Hive connector itself. > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrate+Hive+dialect > > Best regards, > Yuxia------------------------------------------------------------------ > 发件人:Martijn Visser<martijnvis...@apache.org> > 日 期:2022年03月31日 17:09:15 > 收件人:dev<dev@flink.apache.org> > 抄 送:罗宇侠(莫辞)<luoyuxia.luoyu...@alibaba-inc.com.invalid> > 主 题:Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner > > Hi all, > > Thanks for opening this discussion. I agree with Francesco that we should > go for the best solution instead of another bandaid or intermediate > solution. I think there's actually consensus for that, given the argument > provided by Yuxia: > > > The first way is the ideal way and we should go in that direction. But it > will take much effort for it requires rewriting all the code about Hive > dialect and it's hard to do it in one shot. And given we want to move out > Hive connector in 1.16, it's more pratical to decouple first, and then > migrate it to operation tree. > > Why should we first invest time in another intermediate solution and then > spend time afterwards to actually get to the proper solution? I would > propose to: > > - Remove Hive connector for version 1.*, 2.1.* and 2.2* in Flink 1.16 > - Upgrade to the latest supported Hive 2.3.* and 3.1.* in Flink 1.16 > - Get as much work done as possible to decouple the Hive connector from > Flink in 1.16. > > There's customer value in this approach, because we'll support the latest > supported Hive versions. If customers need support for older Hive versions, > they can still use older Flink versions. There is also community value in > this approach, because we're actively working on making our codebase > maintainable. If the entire decoupling is not possible, then we won't move > the Hive 2.3.* and 3.1.* connector out in Flink 1.16, but in Flink 1.17. > Support for the new Hive version 4.* could then also be added in Flink 1.17 > (we should not add support for newer versions of Hive until this decoupling > has been finished). > > Best regards, > > Martijn Visser > https://twitter.com/MartijnVisser82 > https://github.com/MartijnVisser > > > On Wed, 30 Mar 2022 at 17:07, Francesco Guardiani <france...@ververica.com > > > wrote: > > > Sorry I replied on the wrong thread, i repost my answer here :) > > > > As there was already a discussion in the doc, I'll just summarize my > > opinions here on the proposed execution of this FLIP. > > > > I think we should rather avoid exposing internal details, which I > consider > > Calcite to be part of, but rather reuse what we already have to define an > > AST from Table API, which is what I'll refer in this mail as Operation > > tree. > > > > First of all, the reason I think this FLIP is not a good idea is that it > > proposes is to expose types out of our control, so an API we cannot > control > > and we may realistically never be able to stabilize. A Calcite bump in > the > > table project is already pretty hard today, as shown by tasks like that > > https://github.com/apache/flink/pull/13577. This will make them even > > harder. Essentially it will couple us to Calcite even more, and create a > > different but still big maintenance/complexity burden we would like to > get > > rid of with this FLIP. > > > > There are also some technical aspects that seem to me a bit overlooked > > here: > > > > * What about Scala? Is flink-table-planner-spi going to be a scala module > > with the related suffix? Because I see you want to expose a couple of > types > > which we have implemented with Scala right now, and making this module > > Scala dependent makes even more complicated shipping both modules that > use > > it and flink-table-planner-loader. > > * Are you sure exposing the Calcite interfaces is going to be enough? > Don't > > you also require some instance specific methods? E.g. > > FlinkTypeFactory#toLogicalType? What if at some point you need to expose > > something like FlinkTypeFactory? How do you plan to support it and > > stabilize it in the long term? > > > > Now let me talk a bit about the Operation tree. For who doesn't know what > > it is, it's the pure Flink AST for defining DML, used for converting the > > Table API DSL to an AST the planner can manipulate. Essentially, it's our > > own version of the RelNode tree/RexNode tree. This operation tree can be > > used already by Hive, without any API changes on Table side. You just > need > > a downcast of TableEnvironmentImpl to use getPlanner() and use > > Planner#translate, or alternatively you can add getPlanner to > > TableEnvironmentInternal directly. From what I've seen about your use > case, > > and please correct me if I'm wrong, you can implement your SQL -> > Operation > > tree layer without substantial changes on both sides. > > > > The reason why I think this is a better idea, rather than exposing > Calcite > > and RelNodes directly, is: > > > > * Aforementioned downsides of exposing Calcite > > * It doesn't require a new API to get you started with it > > * Doesn't add complexity on planner side, just removes it from the > existing > > coupling with hive > > * Letting another project use the Operation tree will harden it, make it > > more stable and eventually lead to become public > > > > The last point in particular is extremely interesting for the future of > the > > project, as having a stable public Operation tree will allow people to > > implement other relational based APIs on top of Flink SQL, or manipulate > > the AST to define new semantics, or even more crazy things we can't think > > of right now, leading to a broader bigger and more diverse ecosystem. > Which > > is exactly what Hive is doing right now at the end of the day, define a > new > > relational API on top of the Flink Table planner functionalities. > > > > On Wed, Mar 30, 2022 at 4:45 PM 罗宇侠(莫辞) > > <luoyuxia.luoyu...@alibaba-inc.com.invalid> wrote: > > > > > Hi, I would like to explain a bit more about the current dissusion[1] > for > > > the ways to decouple Hive connector with Flink Planner. > > > > > > The background is to support Hive dialect, the Hive connector is > > dependent > > > on Flink Planner for the current implementation is generate RelNode and > > > then deliver the RelNode to Flink. > > > But it also brings much complexity and maintenance burden, so we want > to > > > decouple Hive connector with Flink planner. > > > > > > There're two ways to do that: > > > 1. Make the hive parser just generate an Operation tree like Table API > > > currently does. > > > > > > 2. Introduce a slim module called table-planner-spl which provide > Calcite > > > dependency and expose limit public intefaces. Then, still generating > > > Calcite RelNode, the Hive connector will only require the slim module. > > > > > > The first way is the ideal way and we should go in that direction. But > it > > > will take much effort for it requires rewriting all the code about Hive > > > dialect and it's hard to do it in one shot. > > > And given we want to move out Hive connector in 1.16, it's more > pratical > > > to decouple first, and then migrate it to operation tree. > > > So, the FLIP-216[2] is for the second way. It explains the public > > > interfaces to be exposed, all of which has been implemented by > > > PlannerContext. > > > > > > [1] > > > > > > https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner > > > > > > Best regards, > > > Yuxia------------------------------------------------------------------ > > > 发件人:罗宇侠(莫辞)<luoyuxia.luoyu...@alibaba-inc.com.INVALID> > > > 日 期:2022年03月25日 20:41:15 > > > 收件人:dev@flink.apache.org<dev@flink.apache.org> > > > 主 题:[DISCUSS] FLIP-216 Decouple Hive connector with Flink planner > > > > > > Hi, everyone > > > > > > I would like to open a discussion about decoupling Hive connector with > > > Flink table planner. It's a follow-up discussion after Hive syntax > > > discussion[1], but only focus on how to decouple Hive connector. The > > origin > > > doc is here[2], from which you can see the details work and heated > > > discussion about exposing Calcite API or reuse Operation tree to > > decouple. > > > I have created FLIP-216: Decouple Hive connector with Flink planner[3] > > for > > > it. > > > > > > Thanks and looking forward to a lively discussion! > > > > > > [1] https://lists.apache.org/thread/2w046dwl46tf2wy750gzmt0qrcz17z8t > > > [2] > > > > > > https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing > > > [3] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner > > > > > > Best regards, > > > Yuxia > > > > > > >