Hi Jingsong, Godfrey and Jark, I have updated the FLIP to incorporate your suggestions. Please let me know if you have further comments. Thank you.
On Wed, Feb 3, 2021 at 4:51 PM Rui Li <lirui.fu...@gmail.com> wrote: > Thanks Godfrey & Jark for your comments! > > Since both of you mentioned the naming of the parser factory, I'll answer > this question first. > > I only intend to introduce the pluggable Parser for blink planner. So > the BlinkParserFactory will be added to the flink-table-planner-blink > module. We already have several factory classes there such as > "BlinkPlannerFactory" and "BlinkExecutorFactory". So it seems > "BlinkParserFactory" is following the naming convention of existing > classes. But I'm also fine with "ParserFactory" if we're going to remove > the "blink" prefix anyway. > For the concrete implementation, I prefer "DefaultParserFactory". Having a > "FlinkParserFactory" in blink planner seems a little weird. We can revisit > this when we really get rid of the legacy planner. > > To answer the other questions: > > @Godfrey > Yes I agree DDLs should also be handled by the HiveParser. That'll give > users more consistent experience. > > The "Go Beyond Hive" section is mainly about decoupling the dialect from > hive objects, so that users can write HiveQL to query non-hive tables or > call non-hive functions. I think it's in the scope of this FLIP, but it > doesn't have to be done in the first release. I'll update the FLIP to be > more specific about it. > > @Jark > Creating a new Parser only when dialect changes is a good idea. I'll > update the FLIP to mention that. > > The 3.x new features we need to support in this FLIP include: > > 1. PK & NOT NULL constraints > 2. Alter DB to change location > > These DDLs are already covered by FLIP-123. Missing them will be a > regression. > > Other new features can be identified with more tests and user feedback, > and will be supported incrementally. > > By "we can use a newer version to support older versions", I mean syntax > in the new version is a superset of the old one. But we still need extra > efforts to make sure the copied code works with different hive versions, > e.g. more shim methods are required. So I'd rather start with a popular > version than the newest version. > > > On Wed, Feb 3, 2021 at 11:51 AM Jark Wu <imj...@gmail.com> wrote: > >> Thanks Rui for the great proposal, I believe this can be very attractive >> for many Hive users. >> >> The FLIP looks good to me in general, I only have some minor comments: >> >> 1) BlinkParserFactory >> IIUC, BlinkParserFactory is located in the flink-table-api-java module >> with >> the Parser interface there. >> I suggest renaming it to `ParserFactory`, because it creates Parser >> instead >> of BlinkParser. >> And the implementations can be `HiveParserFactory` and >> `FlinkParserFactory`. >> I think we should avoid the `Blink` keyword in interfaces, blink planner >> is >> already the default planner and >> the old planner will be removed in the near future. There will be no >> `blink` in the future then. >> >> 2) "create a new instance each time getParser is called" >> Finding parser for every time getParser is called sounds heavy to me. I >> think we can improve this by simplify >> caching the Parser instance, and creating a new one if current >> sql-dialect >> is different from the cached Parser. >> >> 3) Hive version >> How much code needs to be done to support new features in 3.x based on >> 2.x? >> Is this also included in this FLIP/release? >> I don't fully understand this because the FLIP says "we can use a newer >> version to support older versions." >> >> Best, >> Jark >> >> On Wed, 3 Feb 2021 at 11:48, godfrey he <godfre...@gmail.com> wrote: >> >> > Thanks for bringing up the discussion, Rui! >> > >> > Regarding the DDL part in the "Introduce HiveParser" section, >> > I would like to choose the second option. Because if we could >> > use one hive parser to parse all hive SQLs, we need not to copy >> > Calcite parser code, and the framework and the code will be very simple. >> > >> > Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ? >> > Could you list all the extensions and give some examples ? >> > >> > One minor suggestion about the name of ParserImplFactory. >> > How about renaming ParserImplFactory to DefaultParserFactory ? >> > >> > Best, >> > Godfrey >> > >> > Rui Li <lirui.fu...@gmail.com> 于2021年2月3日周三 上午11:16写道: >> > >> > > Hi Jingsong, >> > > >> > > Thanks for your comments and they're very good questions. >> > > >> > > Regarding # Version, we need to do some tradeoff here. Choosing the >> > latest >> > > 3.x will cover all the features we want to support. But as you said, >> 3.x >> > > and 2.x can have some differences and requires more efforts to support >> > > lower versions. I decided to pick 2.x and evolve from there to support >> > new >> > > features in 3.x. Because I think most hive users, especially those who >> > are >> > > likely to be interested in this feature, are still using 2.x or even >> 1.x. >> > > So the priority is to cover 2.x and 1.x first. >> > > >> > > Regarding # Hive Codes, in my PoC, I simply copy the code and make as >> few >> > > changes as possible. I believe we can do some clean up or refactor to >> > > reduce it. With that in mind, I expect it to be over 10k lines of java >> > > code, and even more if we count ANTLR grammar files as well. >> > > >> > > Regarding # Functions, you're right that HiveModule is more of a >> solution >> > > than limitation. I just want to emphasize that HiveModule and hive >> > dialect >> > > need to be used together to achieve better compatibility. >> > > >> > > Regarding # Keywords, new hive versions can have more reserved >> keywords >> > > than old versions. Since we're based on hive 2.x code, it may not >> provide >> > > 100% keyword-compatibility to 1.x users. But I expect it to be good >> > enough >> > > for most cases. If not, we can provide different grammar files for >> lower >> > > versions. >> > > >> > > On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <jingsongl...@gmail.com> >> > wrote: >> > > >> > > > Thanks Rui for the proposal, I think this FLIP is required by many >> > users, >> > > > and it is very good to traditional Hive users. I have some >> confusion: >> > > > >> > > > # Version >> > > > >> > > > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive >> 2.X >> > > have >> > > > some differences? >> > > > >> > > > # Hive Codes >> > > > >> > > > Can you evaluate how much code we need to copy to our >> > > flink-hive-connector? >> > > > Do we need to change them? We need to maintain them anyway. >> > > > >> > > > # Functions >> > > > >> > > > About Hive functions, I don't think it is a limitation, we are using >> > > > HiveModule to be compatible with Hive, right? So it is a solution >> > instead >> > > > of a limitation. >> > > > >> > > > # Keywords >> > > > >> > > > Do you think there will be a keyword problem? Or can we be 100% >> > > compatible >> > > > with Hive? >> > > > >> > > > On the whole, the FLIP looks very good and I'm looking forward to >> it. >> > > > >> > > > Best, >> > > > Jingsong >> > > > >> > > > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang >> > > > <wangzhijiang...@aliyun.com.invalid> wrote: >> > > > >> > > > > Thanks for the further info and explanations! I have no other >> > concerns. >> > > > > >> > > > > Best, >> > > > > Zhijiang >> > > > > >> > > > > >> > > > > ------------------------------------------------------------------ >> > > > > From:Rui Li <lirui.fu...@gmail.com> >> > > > > Send Time:2020年12月10日(星期四) 20:35 >> > > > > To:dev <dev@flink.apache.org>; Zhijiang < >> wangzhijiang...@aliyun.com> >> > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility >> > > > > >> > > > > Hi Zhijiang, >> > > > > >> > > > > Glad to know you're interested in this FLIP. I wouldn't claim 100% >> > > > > compatibility with this FLIP. That's because Flink doesn't have >> the >> > > > > functionalities to support all Hive's features. To list a few >> > examples: >> > > > > >> > > > > 1. Hive allows users to process data with shell scripts -- very >> > > > similar >> > > > > to UDFs [1] >> > > > > 2. Users can compile inline Groovy UDFs and use them in queries >> > [2] >> > > > > 3. Users can dynamically add/delete jars, or even execute >> > arbitrary >> > > > > shell command [3] >> > > > > >> > > > > These features cannot be supported merely by a parser/planner, and >> > it's >> > > > > open to discussion whether Flink even should support them at all. >> > > > > >> > > > > So the ultimate goal of this FLIP is to provide Hive syntax >> > > compatibility >> > > > > to features that are already available in Flink, which I believe >> will >> > > > cover >> > > > > most common use cases. >> > > > > >> > > > > [1] >> > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples >> > > > > [2] >> > > > > >> > > > > >> > > > >> > > >> > >> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060 >> > > > > [3] >> > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands >> > > > > >> > > > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang < >> wangzhijiang...@aliyun.com >> > > > > .invalid> >> > > > > wrote: >> > > > > >> > > > > > Thanks for launching the discussion and the FLIP, Rui! >> > > > > > >> > > > > > It is really nice to see our continuous efforts for >> compatibility >> > > with >> > > > > > Hive and benefiting users in this area. >> > > > > > I am only curious that are there any other compatible >> limitations >> > for >> > > > > Hive >> > > > > > users after this FLIP? Or can I say that the Hive compatibility >> is >> > > > > > completely resolved after this FLIP? >> > > > > > I am interested in the ultimate goal in this area. Maybe it is >> out >> > of >> > > > > this >> > > > > > FLIP scope, but still wish some insights from you if possible. >> :) >> > > > > > >> > > > > > Best, >> > > > > > Zhijiang >> > > > > > >> > > > > > >> > > > > > >> ------------------------------------------------------------------ >> > > > > > From:Rui Li <lirui.fu...@gmail.com> >> > > > > > Send Time:2020年12月10日(星期四) 16:46 >> > > > > > To:dev <dev@flink.apache.org> >> > > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility >> > > > > > >> > > > > > Thanks Kurt for your inputs! >> > > > > > >> > > > > > I agree we should extend Hive code to support non-Hive tables. I >> > have >> > > > > > updated the wiki page to remove the limitations you mentioned, >> and >> > > add >> > > > > > typical use cases in the "Motivation" section. >> > > > > > >> > > > > > Regarding comment #b, the interface is defined in >> > > > > flink-table-planner-blink >> > > > > > and only used by the blink planner. So I think >> "BlinkParserFactory" >> > > is >> > > > a >> > > > > > better name, WDYT? >> > > > > > >> > > > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <ykt...@gmail.com> >> > wrote: >> > > > > > >> > > > > > > Thanks Rui for starting this discussion. >> > > > > > > >> > > > > > > I can see the benefit that we improve hive compatibility >> further, >> > > as >> > > > > > quite >> > > > > > > some users are asking for this >> > > > > > > feature in mailing lists [1][2][3] and some online chatting >> tools >> > > > such >> > > > > as >> > > > > > > DingTalk. >> > > > > > > >> > > > > > > I have 3 comments regarding to the design doc: >> > > > > > > >> > > > > > > a) Could you add a section to describe the typical use case >> you >> > > want >> > > > to >> > > > > > > support after this feature is introduced? >> > > > > > > In that way, users can also have an impression how to use this >> > > > feature >> > > > > > and >> > > > > > > what the behavior and outcome will be. >> > > > > > > >> > > > > > > b) Regarding the naming: "BlinkParserFactory", I suggest >> renaming >> > > it >> > > > to >> > > > > > > "FlinkParserFactory". >> > > > > > > >> > > > > > > c) About the two limitations you mentioned: >> > > > > > > 1. Only works with Hive tables and the current catalog >> needs >> > to >> > > > be >> > > > > a >> > > > > > > HiveCatalog. >> > > > > > > 2. Queries cannot involve tables/views from multiple >> > catalogs. >> > > > > > > I assume this is because hive parser and analyzer doesn't >> support >> > > > > > > referring to a name with "x.y.z" fashion? Since >> > > > > > > we can control all the behaviors by leveraging the codes hive >> > > > currently >> > > > > > > use. Is it possible that we can remove such >> > > > > > > limitations? The reason is I'm not sure if users can make the >> > whole >> > > > > story >> > > > > > > work purely depending on hive catalog (that's >> > > > > > > the reason why I gave comment #a). If multiple catalogs are >> > > involved, >> > > > > > with >> > > > > > > this limitation I don't think any meaningful >> > > > > > > pipeline could be built. For example, users want to stream >> data >> > > from >> > > > > > Kafka >> > > > > > > to Hive, fully use hive's dialect including >> > > > > > > query part. The kafka table could be a temporary table or >> saved >> > in >> > > > > > default >> > > > > > > memory catalog. >> > > > > > > >> > > > > > > >> > > > > > > [1] >> > > > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118 >> > > > > > > [2] >> > > > > > >> > > >> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html >> > > > > > > [3] >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html >> > > > > > > >> > > > > > > Best, >> > > > > > > Kurt >> > > > > > > >> > > > > > > >> > > > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <lirui.fu...@gmail.com >> > >> > > > wrote: >> > > > > > > >> > > > > > > > Hi guys, >> > > > > > > > >> > > > > > > > I'd like to start a discussion about providing HiveQL >> > > compatibility >> > > > > for >> > > > > > > > users connecting to a hive warehouse. FLIP-123 has already >> > > covered >> > > > > most >> > > > > > > > DDLs. So now it's time to complement the other big missing >> part >> > > -- >> > > > > > > queries. >> > > > > > > > With FLIP-152, the hive dialect covers more scenarios and >> makes >> > > it >> > > > > even >> > > > > > > > easier for users to migrate to Flink. More details are in >> the >> > > FLIP >> > > > > wiki >> > > > > > > > page [1]. Looking forward to your feedback! >> > > > > > > > >> > > > > > > > [1] >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Best regards! >> > > > > > > > Rui Li >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > Best regards! >> > > > > > Rui Li >> > > > > > >> > > > > > >> > > > > >> > > > > -- >> > > > > Best regards! >> > > > > Rui Li >> > > > > >> > > > > >> > > > >> > > > -- >> > > > Best, Jingsong Lee >> > > > >> > > >> > > >> > > -- >> > > Best regards! >> > > Rui Li >> > > >> > >> > > > -- > Best regards! > Rui Li > -- Best regards! Rui Li