findepi commented on code in PR #8202:
URL: https://github.com/apache/iceberg/pull/8202#discussion_r1280593357


##########
core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java:
##########
@@ -26,4 +26,17 @@ private StandardBlobTypes() {}
    * href="https://datasketches.apache.org/";>Apache DataSketches</a> library
    */
   public static final String APACHE_DATASKETCHES_THETA_V1 = 
"apache-datasketches-theta-v1";
+
+  /**
+   * A serialized form of <a
+   * 
href="https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L854C17";>
+   * HIVE_COLUMN_STATS_OBJ</a>
+   */
+  public static final String HIVE_COLUMN_STATS_OBJ = "column-statistics-obj";

Review Comment:
   Nothing prevents Hive from using arbitrary String IDs for Hive-specific 
statistics, however I'd encourage to prefix Hive-specific statistics with eg 
"hive." to avoid conflicts in the future.
   
   If this is supposed to be a "standard blob type", it needs to be documented 
at https://iceberg.apache.org/puffin-spec/#blob-types in Hive-agnostic way



##########
core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java:
##########
@@ -26,4 +26,17 @@ private StandardBlobTypes() {}
    * href="https://datasketches.apache.org/";>Apache DataSketches</a> library
    */
   public static final String APACHE_DATASKETCHES_THETA_V1 = 
"apache-datasketches-theta-v1";
+
+  /**
+   * A serialized form of <a
+   * 
href="https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L854C17";>

Review Comment:
   nit: github links to a branch are likely to drift over time or even break 
(if eg module, package or class are renamed or removed)
   
   



##########
core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java:
##########
@@ -26,4 +26,17 @@ private StandardBlobTypes() {}
    * href="https://datasketches.apache.org/";>Apache DataSketches</a> library
    */
   public static final String APACHE_DATASKETCHES_THETA_V1 = 
"apache-datasketches-theta-v1";
+
+  /**
+   * A serialized form of <a
+   * 
href="https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L854C17";>
+   * HIVE_COLUMN_STATS_OBJ</a>
+   */
+  public static final String HIVE_COLUMN_STATS_OBJ = "column-statistics-obj";
+
+  /**
+   * A serialized form of KLL sketch produced by the <a
+   * href="https://datasketches.apache.org/";>Apache DataSketches</a> library
+   */
+  public static final String APACHE_DATASKETCHES_KLL_SKETCH = 
"apache_datasketches_kll_sketch";

Review Comment:
   Add documentation to https://iceberg.apache.org/puffin-spec/#blob-types
   and follow existing naming convention (hyphenated names).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to