Zoltán Borók-Nagy created HIVE-27356:
----------------------------------------
Summary: Hive should write name of blob type instead of table name
in Puffing
Key: HIVE-27356
URL: https://issues.apache.org/jira/browse/HIVE-27356
Project: Hive
Issue Type: Bug
Reporter: Zoltán Borók-Nagy
Currently Hive writes the name of the table plus snapshot id as blob type:
[https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
Instead, it should write the name of the blog it writes. Table name and
snapshot id are redundant information anyway, as they can be inferred from the
location and filename of the puffin file.
Currently it writes a non-standard blob (Standard blob types are listed
[here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
I think it would be better to write standard blobs for interoperability. But
if Hive wants to write non-standard blobs anyway, it should still come up with
a descriptive name for them, e.g. 'hive-column-statistics-v1'.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)