Hi Butao, Thanks for reminding us about the issue. +1 for the approach.
Thanks, Okumin On Mon, Jan 8, 2024 at 1:38 PM Butao Zhang <butaozha...@163.com> wrote: > > Hi, dev > Bump this thread! > I just filed a ticket to track&fix this incompatibility issues about hms > column stats thrift api. I think we can fix this at its root in HMS side and > then any other third components will not suffer from this issue. > https://issues.apache.org/jira/browse/HIVE-27984 > > > Thanks, > Butao Zhang > ---- Replied Message ---- > | From | Okumin<m...@okumin.com> | > | Date | 8/20/2023 18:37 | > | To | <dev@hive.apache.org> | > | Subject | Re: How to use `engine` introduced by HIVE-22046 | > Hi Butao, > > Thanks for sharing your PR! I didn't find trinodb/trino-hive-apache > or trinodb/hive-thrift. > > As mentioned in the PR, the current Thrift definitions might not be the > final version, but it sounds reasonable to give information to external > products since we versioned Hive 4 beta. I'm curious if anyone why we give > different engine names to Hive and Impala and what are the recommended > options. > > Thanks, > Okumin > > > > On Fri, Aug 11, 2023 at 10:39 AM Butao Zhang <butaozha...@163.com> wrote: > > Hi, Okumin > > > I have encountered this issue before, and the 'validWriteIdList' is also a > incompatibility parameter. I have submit a PR in trino-hive-apache repo, > and you can refer to https://github.com/trinodb/trino-hive-apache/pull/43 > . > IIUC, the 'engine' parameter is used to differentiate between stats > produced by different engines(Hive&Spark&Presto&Impala), but it seems that > the downstream engines do not want to adopt&realize the new 'engine' > parameter. > At present, if some engines(e.g. Trino) use the customized thrift api to > interact wiht hms, it must change its thrift file to match the thrift > definition of hms. > BTW, maybe we can change hms thrift file to make the 'engine' parameter > optional and then other customized thrift client will not have > compatibility issues. > > Thanks, > > Butao Zhang > > ---- Replied Message ---- > | From | Okumin<m...@okumin.com> | > | Date | 8/10/2023 23:41 | > | To | <dev@hive.apache.org> | > | Subject | How to use `engine` introduced by HIVE-22046 | > Hi Hive developers, > > I noticed HIVE-22046 introduced incompatibility to Metastore APIs while I'm > testing integration between Hive 4 and other software. If I understand > correctly, clients are currently required to additionally specify the > engine name when they get or update column statistics. > > - https://issues.apache.org/jira/browse/HIVE-22046 > - https://github.com/apache/hive/pull/741 > > For example, Trino has a feature to use column stats and it fails. Note > that I am not 100% sure about Trino's implementation or behavior. > > ``` > trino> create table hive.default.test_trino (id int); > Query 20230810_152236_00004_t9n6h failed: Required field 'engine' is unset! > Struct:TableStatsRequest(dbName:default, tblName:test_trino, colNames:[id], > engine:null) > ``` > > I have two questions about this feature. > > (1) Should any engine use a unique engine name? > > I guess some software can store or use stats compatible with Hive. I wonder > if it can reuse engine=hive in that case, or should use a different name > like engine=trino. > > I see Impala gives a unique engine name to metastore. Taking a glance, > Spark is unlikely to be using col stats of Hive directly. > > - https://issues.apache.org/jira/browse/IMPALA-8842 > > (2) Should Hive Metastore use engine=hive as a default value? > > If other compatible software can reuse engine=hive, it could be an option > to accept requests with the old format assuming its engine is "hive" for > compatibility. Or should they explicitly specify engine=hive when using > Hive 4? > > Regards, > Okumin >