Re: How to use `engine` introduced by HIVE-22046

Butao Zhang Thu, 10 Aug 2023 18:39:53 -0700

Hi, Okumin


I have encountered this issue before, and the 'validWriteIdList' is also a 
incompatibility parameter. I have submit a PR in trino-hive-apache repo, and 
you can refer to https://github.com/trinodb/trino-hive-apache/pull/43 .
IIUC, the 'engine' parameter is used to differentiate between stats produced by 
different engines(Hive&Spark&Presto&Impala), but it seems that the downstream 
engines do not want to adopt&realize the new 'engine' 
parameter.
At present, if some engines(e.g. Trino) use the customized thrift api to 
interact wiht hms, it must change its thrift file to match the thrift 
definition of hms.
BTW, maybe we can change hms thrift file to make the 'engine' parameter 
optional and then other customized thrift client will not have compatibility 
issues.

Thanks,

Butao Zhang

---- Replied Message ----
| From | Okumin<[email protected]> |
| Date | 8/10/2023 23:41 |
| To | <[email protected]> |
| Subject | How to use `engine` introduced by HIVE-22046 |
Hi Hive developers,

I noticed HIVE-22046 introduced incompatibility to Metastore APIs while I'm
testing integration between Hive 4 and other software. If I understand
correctly, clients are currently required to additionally specify the
engine name when they get or update column statistics.

- https://issues.apache.org/jira/browse/HIVE-22046
- https://github.com/apache/hive/pull/741

For example, Trino has a feature to use column stats and it fails. Note
that I am not 100% sure about Trino's implementation or behavior.

```
trino> create table hive.default.test_trino (id int);
Query 20230810_152236_00004_t9n6h failed: Required field 'engine' is unset!
Struct:TableStatsRequest(dbName:default, tblName:test_trino, colNames:[id],
engine:null)
```

I have two questions about this feature.

(1) Should any engine use a unique engine name?

I guess some software can store or use stats compatible with Hive. I wonder
if it can reuse engine=hive in that case, or should use a different name
like engine=trino.

I see Impala gives a unique engine name to metastore. Taking a glance,
Spark is unlikely to be using col stats of Hive directly.

- https://issues.apache.org/jira/browse/IMPALA-8842

(2) Should Hive Metastore use engine=hive as a default value?

If other compatible software can reuse engine=hive, it could be an option
to accept requests with the old format assuming its engine is "hive" for
compatibility. Or should they explicitly specify engine=hive when using
Hive 4?

Regards,
Okumin

Re: How to use `engine` introduced by HIVE-22046

Reply via email to