Hi, Okumin
I have encountered this issue before, and the 'validWriteIdList' is also a incompatibility parameter. I have submit a PR in trino-hive-apache repo, and you can refer to https://github.com/trinodb/trino-hive-apache/pull/43 . IIUC, the 'engine' parameter is used to differentiate between stats produced by different engines(Hive&Spark&Presto&Impala), but it seems that the downstream engines do not want to adopt&realize the new 'engine' parameter. At present, if some engines(e.g. Trino) use the customized thrift api to interact wiht hms, it must change its thrift file to match the thrift definition of hms. BTW, maybe we can change hms thrift file to make the 'engine' parameter optional and then other customized thrift client will not have compatibility issues. Thanks, Butao Zhang ---- Replied Message ---- | From | Okumin<m...@okumin.com> | | Date | 8/10/2023 23:41 | | To | <dev@hive.apache.org> | | Subject | How to use `engine` introduced by HIVE-22046 | Hi Hive developers, I noticed HIVE-22046 introduced incompatibility to Metastore APIs while I'm testing integration between Hive 4 and other software. If I understand correctly, clients are currently required to additionally specify the engine name when they get or update column statistics. - https://issues.apache.org/jira/browse/HIVE-22046 - https://github.com/apache/hive/pull/741 For example, Trino has a feature to use column stats and it fails. Note that I am not 100% sure about Trino's implementation or behavior. ``` trino> create table hive.default.test_trino (id int); Query 20230810_152236_00004_t9n6h failed: Required field 'engine' is unset! Struct:TableStatsRequest(dbName:default, tblName:test_trino, colNames:[id], engine:null) ``` I have two questions about this feature. (1) Should any engine use a unique engine name? I guess some software can store or use stats compatible with Hive. I wonder if it can reuse engine=hive in that case, or should use a different name like engine=trino. I see Impala gives a unique engine name to metastore. Taking a glance, Spark is unlikely to be using col stats of Hive directly. - https://issues.apache.org/jira/browse/IMPALA-8842 (2) Should Hive Metastore use engine=hive as a default value? If other compatible software can reuse engine=hive, it could be an option to accept requests with the old format assuming its engine is "hive" for compatibility. Or should they explicitly specify engine=hive when using Hive 4? Regards, Okumin