Zoltán Borók-Nagy created HIVE-27356:
----------------------------------------

             Summary: Hive should write name of blob type instead of table name 
in Puffing
                 Key: HIVE-27356
                 URL: https://issues.apache.org/jira/browse/HIVE-27356
             Project: Hive
          Issue Type: Bug
            Reporter: Zoltán Borók-Nagy


Currently Hive writes the name of the table plus snapshot id as blob type:

[https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]

Instead, it should write the name of the blog it writes. Table name and 
snapshot id are redundant information anyway, as they can be inferred from the 
location and filename of the puffin file.

Currently it writes a non-standard blob (Standard blob types are listed 
[here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
 I think it would be better to write standard blobs for interoperability. But 
if Hive wants to write non-standard blobs anyway, it should still come up with 
a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to