Hi Jark and Yuxia, I think this sounds like a good idea +1. Thanks a lot for taking the time to invest into this.
Best regards, Martijn Visser https://twitter.com/MartijnVisser82 https://github.com/MartijnVisser On Tue, 19 Apr 2022 at 17:30, Jark Wu <imj...@gmail.com> wrote: > Hi Martijn, > > I have discussed this with Yuxia offline and improved the design again. > > *Here are the improvements:* > 1) all the public APIs are staying in flink-table-common or > flink-table-api-java modules, > 2) rename "flink-table-planner-spi" to "flink-table-calcite-bridge" which > only contains Calcite dependency > and internal classes (CalciteContext & CalciteQueryOperation). This module > is only used for parser plugins > to interact with Calcite APIs. > > *And here is the migration plan:* > 0) expose parser public APIs and introduce the "flink-table-calcite-bridge" > 1) make Hive connector planner free (i.e. only depend on > "flink-table-calcite-bridge"). ~ 1 week > 2) migration RelNode translation to Operation tree one by one. ~ several > months, may cross versions > 3) drop "flink-table-calcite-bridge" module > 4) all the other 3rd-party dialects are not affected. > > I think this can address your concerns about the final state of APIs, but > also allow us to have a smooth migration path > to guarantee a high-quality code. > > What do you think about this? > > Best, > Jark > > > On Wed, 13 Apr 2022 at 22:12, 罗宇侠(莫辞) > <luoyuxia.luoyu...@alibaba-inc.com.invalid> wrote: > >> Hi all, >> Sorry for the late reply for this thread. >> About decoupling Hive Connector, it is actually mainly for decoupling >> Hive dialect. So, I think it's a good timing to introduce pluagble dialect >> mechanism for Flink and make Hive dialect as the first. >> Based on this point, I have updated the FLIP-216 (Introduce pluggable >> dialect and plan for migrate Hive dialect)[1]. >> The overview of the FLIP is as follows: >> >> 1: Introuce a slim module with limited public interfaces for the pluagble >> dialect. The public interfaces are in final state and other dialects should >> implement the interfaces and follow the style that converts SQL statement >> to Flink's Operation Tree. >> >> 2: Plan for migrating Hive Dialect. For implementing Hive dialect, it's >> also expected to convert SQL statement to Flink's Operation Tree. But >> unfortunately, the current implementation is convert SQL statment to >> Calcite RelNode. It's hard to migrate it to Operation tree at one shot. >> It'll be better to migrate it step by step. So, the first step is to stll >> keep Calcite dependency and introduce an internal interface called >> CalciteContext to create Calcite RelNode, then we can decouple to >> flink-table-planner. As a result, we can move Hive connector out from Flink >> repository. The second step is to migrate it to Operation Tree, so that we >> can drop the Calcite dependency. >> >> For some questiones from previous emails: >> >> What about Scala? Is flink-table-planner-spi going to be a scala >> module with the related suffix? >> flink-table-planner-spi is going to be a scala free moudle, for Hive >> dialect, it's fine not to expose a couple of types >> which implemented with Scala. >> >> Are you sure exposing the Calcite interfaces is going to be enough? >> I'm quite sure. As for the specific method like >> FlinkTypeFactory#toLogicalType, I think we think we can implement a simliar >> type converter in Hive connector itself. >> >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrate+Hive+dialect >> >> Best regards, >> Yuxia------------------------------------------------------------------ >> 发件人:Martijn Visser<martijnvis...@apache.org> >> 日 期:2022年03月31日 17:09:15 >> 收件人:dev<dev@flink.apache.org> >> 抄 送:罗宇侠(莫辞)<luoyuxia.luoyu...@alibaba-inc.com.invalid> >> 主 题:Re: [DISCUSS] FLIP-216 Decouple Hive connector with Flink planner >> >> Hi all, >> >> Thanks for opening this discussion. I agree with Francesco that we should >> go for the best solution instead of another bandaid or intermediate >> solution. I think there's actually consensus for that, given the argument >> provided by Yuxia: >> >> > The first way is the ideal way and we should go in that direction. But >> it >> will take much effort for it requires rewriting all the code about Hive >> dialect and it's hard to do it in one shot. And given we want to move out >> Hive connector in 1.16, it's more pratical to decouple first, and then >> migrate it to operation tree. >> >> Why should we first invest time in another intermediate solution and then >> spend time afterwards to actually get to the proper solution? I would >> propose to: >> >> - Remove Hive connector for version 1.*, 2.1.* and 2.2* in Flink 1.16 >> - Upgrade to the latest supported Hive 2.3.* and 3.1.* in Flink 1.16 >> - Get as much work done as possible to decouple the Hive connector from >> Flink in 1.16. >> >> There's customer value in this approach, because we'll support the latest >> supported Hive versions. If customers need support for older Hive >> versions, >> they can still use older Flink versions. There is also community value in >> this approach, because we're actively working on making our codebase >> maintainable. If the entire decoupling is not possible, then we won't move >> the Hive 2.3.* and 3.1.* connector out in Flink 1.16, but in Flink 1.17. >> Support for the new Hive version 4.* could then also be added in Flink >> 1.17 >> (we should not add support for newer versions of Hive until this >> decoupling >> has been finished). >> >> Best regards, >> >> Martijn Visser >> https://twitter.com/MartijnVisser82 >> https://github.com/MartijnVisser >> >> >> On Wed, 30 Mar 2022 at 17:07, Francesco Guardiani < >> france...@ververica.com> >> wrote: >> >> > Sorry I replied on the wrong thread, i repost my answer here :) >> > >> > As there was already a discussion in the doc, I'll just summarize my >> > opinions here on the proposed execution of this FLIP. >> > >> > I think we should rather avoid exposing internal details, which I >> consider >> > Calcite to be part of, but rather reuse what we already have to define >> an >> > AST from Table API, which is what I'll refer in this mail as Operation >> > tree. >> > >> > First of all, the reason I think this FLIP is not a good idea is that it >> > proposes is to expose types out of our control, so an API we cannot >> control >> > and we may realistically never be able to stabilize. A Calcite bump in >> the >> > table project is already pretty hard today, as shown by tasks like that >> > https://github.com/apache/flink/pull/13577. This will make them even >> > harder. Essentially it will couple us to Calcite even more, and create a >> > different but still big maintenance/complexity burden we would like to >> get >> > rid of with this FLIP. >> > >> > There are also some technical aspects that seem to me a bit overlooked >> > here: >> > >> > * What about Scala? Is flink-table-planner-spi going to be a scala >> module >> > with the related suffix? Because I see you want to expose a couple of >> types >> > which we have implemented with Scala right now, and making this module >> > Scala dependent makes even more complicated shipping both modules that >> use >> > it and flink-table-planner-loader. >> > * Are you sure exposing the Calcite interfaces is going to be enough? >> Don't >> > you also require some instance specific methods? E.g. >> > FlinkTypeFactory#toLogicalType? What if at some point you need to expose >> > something like FlinkTypeFactory? How do you plan to support it and >> > stabilize it in the long term? >> > >> > Now let me talk a bit about the Operation tree. For who doesn't know >> what >> > it is, it's the pure Flink AST for defining DML, used for converting the >> > Table API DSL to an AST the planner can manipulate. Essentially, it's >> our >> > own version of the RelNode tree/RexNode tree. This operation tree can be >> > used already by Hive, without any API changes on Table side. You just >> need >> > a downcast of TableEnvironmentImpl to use getPlanner() and use >> > Planner#translate, or alternatively you can add getPlanner to >> > TableEnvironmentInternal directly. From what I've seen about your use >> case, >> > and please correct me if I'm wrong, you can implement your SQL -> >> Operation >> > tree layer without substantial changes on both sides. >> > >> > The reason why I think this is a better idea, rather than exposing >> Calcite >> > and RelNodes directly, is: >> > >> > * Aforementioned downsides of exposing Calcite >> > * It doesn't require a new API to get you started with it >> > * Doesn't add complexity on planner side, just removes it from the >> existing >> > coupling with hive >> > * Letting another project use the Operation tree will harden it, make it >> > more stable and eventually lead to become public >> > >> > The last point in particular is extremely interesting for the future of >> the >> > project, as having a stable public Operation tree will allow people to >> > implement other relational based APIs on top of Flink SQL, or manipulate >> > the AST to define new semantics, or even more crazy things we can't >> think >> > of right now, leading to a broader bigger and more diverse ecosystem. >> Which >> > is exactly what Hive is doing right now at the end of the day, define a >> new >> > relational API on top of the Flink Table planner functionalities. >> > >> > On Wed, Mar 30, 2022 at 4:45 PM 罗宇侠(莫辞) >> > <luoyuxia.luoyu...@alibaba-inc.com.invalid> wrote: >> > >> > > Hi, I would like to explain a bit more about the current dissusion[1] >> for >> > > the ways to decouple Hive connector with Flink Planner. >> > > >> > > The background is to support Hive dialect, the Hive connector is >> > dependent >> > > on Flink Planner for the current implementation is generate RelNode >> and >> > > then deliver the RelNode to Flink. >> > > But it also brings much complexity and maintenance burden, so we want >> to >> > > decouple Hive connector with Flink planner. >> > > >> > > There're two ways to do that: >> > > 1. Make the hive parser just generate an Operation tree like Table API >> > > currently does. >> > > >> > > 2. Introduce a slim module called table-planner-spl which provide >> Calcite >> > > dependency and expose limit public intefaces. Then, still generating >> > > Calcite RelNode, the Hive connector will only require the slim module. >> > > >> > > The first way is the ideal way and we should go in that direction. >> But it >> > > will take much effort for it requires rewriting all the code about >> Hive >> > > dialect and it's hard to do it in one shot. >> > > And given we want to move out Hive connector in 1.16, it's more >> pratical >> > > to decouple first, and then migrate it to operation tree. >> > > So, the FLIP-216[2] is for the second way. It explains the public >> > > interfaces to be exposed, all of which has been implemented by >> > > PlannerContext. >> > > >> > > [1] >> > > >> > >> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing >> > > [2] >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner >> > > >> > > Best regards, >> > > >> Yuxia------------------------------------------------------------------ >> > > 发件人:罗宇侠(莫辞)<luoyuxia.luoyu...@alibaba-inc.com.INVALID> >> > > 日 期:2022年03月25日 20:41:15 >> > > 收件人:dev@flink.apache.org<dev@flink.apache.org> >> > > 主 题:[DISCUSS] FLIP-216 Decouple Hive connector with Flink planner >> > > >> > > Hi, everyone >> > > >> > > I would like to open a discussion about decoupling Hive connector with >> > > Flink table planner. It's a follow-up discussion after Hive syntax >> > > discussion[1], but only focus on how to decouple Hive connector. The >> > origin >> > > doc is here[2], from which you can see the details work and heated >> > > discussion about exposing Calcite API or reuse Operation tree to >> > decouple. >> > > I have created FLIP-216: Decouple Hive connector with Flink planner[3] >> > for >> > > it. >> > > >> > > Thanks and looking forward to a lively discussion! >> > > >> > > [1] https://lists.apache.org/thread/2w046dwl46tf2wy750gzmt0qrcz17z8t >> > > [2] >> > > >> > >> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing >> > > [3] >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner >> > > >> > > Best regards, >> > > Yuxia >> > > >> > >> >>