Re: How to use `engine` introduced by HIVE-22046

Okumin Tue, 09 Jan 2024 19:51:36 -0800

Hi Butao,

Thanks for reminding us about the issue. +1 for the approach.


Thanks,
Okumin

On Mon, Jan 8, 2024 at 1:38 PM Butao Zhang <butaozha...@163.com> wrote:
>
> Hi, dev
> Bump this thread!
> I just filed a ticket to track&fix this incompatibility issues about hms 
> column stats thrift api.  I think we can fix this at its root in HMS side and 
> then any other third components will not suffer from this issue.
> https://issues.apache.org/jira/browse/HIVE-27984
>
>
> Thanks,
> Butao Zhang
> ---- Replied Message ----
> | From | Okumin<m...@okumin.com> |
> | Date | 8/20/2023 18:37 |
> | To | <dev@hive.apache.org> |
> | Subject | Re: How to use `engine` introduced by HIVE-22046 |
> Hi Butao,
>
> Thanks for sharing your PR! I didn't find trinodb/trino-hive-apache
> or trinodb/hive-thrift.
>
> As mentioned in the PR, the current Thrift definitions might not be the
> final version, but it sounds reasonable to give information to external
> products since we versioned Hive 4 beta. I'm curious if anyone why we give
> different engine names to Hive and Impala and what are the recommended
> options.
>
> Thanks,
> Okumin
>
>
>
> On Fri, Aug 11, 2023 at 10:39 AM Butao Zhang <butaozha...@163.com> wrote:
>
> Hi, Okumin
>
>
> I have encountered this issue before, and the 'validWriteIdList' is also a
> incompatibility parameter. I have submit a PR in trino-hive-apache repo,
> and you can refer to https://github.com/trinodb/trino-hive-apache/pull/43
> .
> IIUC, the 'engine' parameter is used to differentiate between stats
> produced by different engines(Hive&Spark&Presto&Impala), but it seems that
> the downstream engines do not want to adopt&realize the new 'engine'
> parameter.
> At present, if some engines(e.g. Trino) use the customized thrift api to
> interact wiht hms, it must change its thrift file to match the thrift
> definition of hms.
> BTW, maybe we can change hms thrift file to make the 'engine' parameter
> optional and then other customized thrift client will not have
> compatibility issues.
>
> Thanks,
>
> Butao Zhang
>
> ---- Replied Message ----
> | From | Okumin<m...@okumin.com> |
> | Date | 8/10/2023 23:41 |
> | To | <dev@hive.apache.org> |
> | Subject | How to use `engine` introduced by HIVE-22046 |
> Hi Hive developers,
>
> I noticed HIVE-22046 introduced incompatibility to Metastore APIs while I'm
> testing integration between Hive 4 and other software. If I understand
> correctly, clients are currently required to additionally specify the
> engine name when they get or update column statistics.
>
> - https://issues.apache.org/jira/browse/HIVE-22046
> - https://github.com/apache/hive/pull/741
>
> For example, Trino has a feature to use column stats and it fails. Note
> that I am not 100% sure about Trino's implementation or behavior.
>
> ```
> trino> create table hive.default.test_trino (id int);
> Query 20230810_152236_00004_t9n6h failed: Required field 'engine' is unset!
> Struct:TableStatsRequest(dbName:default, tblName:test_trino, colNames:[id],
> engine:null)
> ```
>
> I have two questions about this feature.
>
> (1) Should any engine use a unique engine name?
>
> I guess some software can store or use stats compatible with Hive. I wonder
> if it can reuse engine=hive in that case, or should use a different name
> like engine=trino.
>
> I see Impala gives a unique engine name to metastore. Taking a glance,
> Spark is unlikely to be using col stats of Hive directly.
>
> - https://issues.apache.org/jira/browse/IMPALA-8842
>
> (2) Should Hive Metastore use engine=hive as a default value?
>
> If other compatible software can reuse engine=hive, it could be an option
> to accept requests with the old format assuming its engine is "hive" for
> compatibility. Or should they explicitly specify engine=hive when using
> Hive 4?
>
> Regards,
> Okumin
>

Re: How to use `engine` introduced by HIVE-22046

Reply via email to