[
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy updated HIVE-27356:
-------------------------------------
Summary: Hive should write name of blob type instead of table name in
Puffin (was: Hive should write name of blob type instead of table name in
Puffing)
> Hive should write name of blob type instead of table name in Puffin
> -------------------------------------------------------------------
>
> Key: HIVE-27356
> URL: https://issues.apache.org/jira/browse/HIVE-27356
> Project: Hive
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Priority: Major
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and
> snapshot id are redundant information anyway, as they can be inferred from
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
> I think it would be better to write standard blobs for interoperability. But
> if Hive wants to write non-standard blobs anyway, it should still come up
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)