This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hive-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 610f3437 deploy: 5c9a634ef73277d23e7ce7537cdd8a425877cd12
610f3437 is described below

commit 610f3437acf88830689ccfe3b18b3438047afaa1
Author: okumin <[email protected]>
AuthorDate: Tue Feb 24 12:18:41 2026 +0000

    deploy: 5c9a634ef73277d23e7ce7537cdd8a425877cd12
---
 .../column-statistics-in-hive/index.html           | 165 ++++++++++++++++++++-
 .../desingdocs/default-constraint/index.html       |   2 +-
 development/desingdocs/design/index.html           |   2 +-
 .../index.html                                     |  53 +++++--
 index.json                                         |   2 +-
 5 files changed, 204 insertions(+), 20 deletions(-)

diff --git a/development/desingdocs/column-statistics-in-hive/index.html 
b/development/desingdocs/column-statistics-in-hive/index.html
index 020e9233..847583dd 100644
--- a/development/desingdocs/column-statistics-in-hive/index.html
+++ b/development/desingdocs/column-statistics-in-hive/index.html
@@ -4,7 +4,170 @@
 <button type=submit class=search-button aria-label="Submit search">
 <i class="fas 
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
 class=content><div class=docs-container><main class="docs-main 
docs-main-full"><article class=docs-content><nav 
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i> 
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache 
Hive : Column Statistics in Hive</li></ol></nav><header class=docs-header><h1 
class=docs-title>Apache Hive : Column Statistics in Hive</h1 [...]
 Last updated: December 12, 2024</span></div></header><div 
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav 
id=TableOfContents><ul><li><a 
href=#apache-hive--column-statistics-in-hive>Apache Hive : Column Statistics in 
Hive</a><ul><li><ul><li><a 
href=#introduction><strong>Introduction</strong></a></li><li><a 
href=#hiveql-changes><strong>HiveQL changes</strong></a></li><li><a 
href=#metastore-schema><strong>Metastore Schema</strong></a></li><li><a 
href=#metastore-thr [...]
-</code></pre><h3 id=metastore-schema><strong>Metastore 
Schema</strong></h3><p>To persist column level statistics, we propose to add 
the following new tables,</p><p>CREATE TABLE TAB_COL_STATS<br>(<br>CS_ID NUMBER 
NOT NULL,<br>TBL_ID NUMBER NOT NULL,<br>COLUMN_NAME VARCHAR(128) NOT 
NULL,<br>COLUMN_TYPE VARCHAR(128) NOT NULL,<br>TABLE_NAME VARCHAR(128) NOT 
NULL,<br>DB_NAME VARCHAR(128) NOT NULL,</p><p>LOW_VALUE RAW,<br>HIGH_VALUE 
RAW,<br>NUM_NULLS BIGINT,<br>NUM_DISTINCTS BIGINT,</p><p>BIT_ [...]
+</code></pre><h3 id=metastore-schema><strong>Metastore 
Schema</strong></h3><p>To persist column level statistics, we propose to add 
the following new tables,</p><pre tabindex=0><code>CREATE TABLE TAB_COL_STATS  
+ (  
+ CS_ID NUMBER NOT NULL,  
+ TBL_ID NUMBER NOT NULL,  
+ COLUMN_NAME VARCHAR(128) NOT NULL,  
+ COLUMN_TYPE VARCHAR(128) NOT NULL,  
+ TABLE_NAME VARCHAR(128) NOT NULL,  
+ DB_NAME VARCHAR(128) NOT NULL,
+
+LOW_VALUE RAW,  
+ HIGH_VALUE RAW,  
+ NUM_NULLS BIGINT,  
+ NUM_DISTINCTS BIGINT,
+
+BIT_VECTOR, BLOB,  /* introduced in 
[HIVE-16997](https://issues.apache.org/jira/browse/HIVE-16997) in Hive 3.0.0 */
+
+AVG_COL_LEN DOUBLE,  
+ MAX_COL_LEN BIGINT,  
+ NUM_TRUES BIGINT,  
+ NUM_FALSES BIGINT,  
+ LAST_ANALYZED BIGINT NOT NULL)
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY 
(CS_ID);
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY 
(TBL_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;
+
+CREATE TABLE PART_COL_STATS  
+ (  
+ CS_ID NUMBER NOT NULL,  
+ PART_ID NUMBER NOT NULL,
+
+DB_NAME VARCHAR(128) NOT NULL,  
+ COLUMN_NAME VARCHAR(128) NOT NULL,  
+ COLUMN_TYPE VARCHAR(128) NOT NULL,  
+ TABLE_NAME VARCHAR(128) NOT NULL,  
+ PART_NAME VARCHAR(128) NOT NULL,
+
+LOW_VALUE RAW,  
+ HIGH_VALUE RAW,  
+ NUM_NULLS BIGINT,  
+ NUM_DISTINCTS BIGINT,
+
+BIT_VECTOR, BLOB,  /* introduced in 
[HIVE-16997](https://issues.apache.org/jira/browse/HIVE-16997) in Hive 3.0.0 */
+
+AVG_COL_LEN DOUBLE,  
+ MAX_COL_LEN BIGINT,  
+ NUM_TRUES BIGINT,  
+ NUM_FALSES BIGINT,  
+ LAST_ANALYZED BIGINT NOT NULL)
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY 
(CS_ID);
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY 
(PART_ID) REFERENCES PARTITIONS (PART_ID) INITIALLY DEFERRED;
+</code></pre><h3 id=metastore-thrift-api><strong>Metastore Thrift 
API</strong></h3><p>We propose to add the following Thrift structs to transport 
column statistics:</p><pre tabindex=0><code>struct BooleanColumnStatsData {  
+ 1: required i64 numTrues,  
+ 2: required i64 numFalses,  
+ 3: required i64 numNulls  
+ }
+
+struct DoubleColumnStatsData {  
+ 1: required double lowValue,  
+ 2: required double highValue,  
+ 3: required i64 numNulls,  
+ 4: required i64 numDVs,
+
+5: optional string bitVectors
+
+}
+
+struct LongColumnStatsData {  
+ 1: required i64 lowValue,  
+ 2: required i64 highValue,  
+ 3: required i64 numNulls,  
+ 4: required i64 numDVs,
+
+5: optional string bitVectors  
+ }
+
+struct StringColumnStatsData {  
+ 1: required i64 maxColLen,  
+ 2: required double avgColLen,  
+ 3: required i64 numNulls,  
+ 4: required i64 numDVs,
+
+5: optional string bitVectors  
+ }
+
+struct BinaryColumnStatsData {  
+ 1: required i64 maxColLen,  
+ 2: required double avgColLen,  
+ 3: required i64 numNulls  
+ }
+
+struct Decimal {  
+1: required binary unscaled,  
+3: required i16 scale  
+}
+
+struct DecimalColumnStatsData {  
+1: optional Decimal lowValue,  
+2: optional Decimal highValue,  
+3: required i64 numNulls,  
+4: required i64 numDVs,  
+5: optional string bitVectors  
+}
+
+struct Date {  
+1: required i64 daysSinceEpoch  
+}
+
+struct DateColumnStatsData {  
+1: optional Date lowValue,  
+2: optional Date highValue,  
+3: required i64 numNulls,  
+4: required i64 numDVs,  
+5: optional string bitVectors  
+}
+
+union ColumnStatisticsData {  
+1: BooleanColumnStatsData booleanStats,  
+2: LongColumnStatsData longStats,  
+3: DoubleColumnStatsData doubleStats,  
+4: StringColumnStatsData stringStats,  
+5: BinaryColumnStatsData binaryStats,  
+6: DecimalColumnStatsData decimalStats,  
+7: DateColumnStatsData dateStats  
+}
+
+struct ColumnStatisticsObj {  
+ 1: required string colName,  
+ 2: required string colType,  
+ 3: required ColumnStatisticsData statsData  
+ }
+
+struct ColumnStatisticsDesc {  
+ 1: required bool isTblLevel,   
+ 2: required string dbName,  
+ 3: required string tableName,  
+ 4: optional string partName,  
+ 5: optional i64 lastAnalyzed  
+ }
+
+struct ColumnStatistics {  
+ 1: required ColumnStatisticsDesc statsDesc,  
+ 2: required list&lt;ColumnStatisticsObj&gt; statsObj;  
+ }
+</code></pre><p>We propose to add the following Thrift APIs to persist, 
retrieve and delete column statistics:</p><pre tabindex=0><code>bool 
update_table_column_statistics(1:ColumnStatistics stats_obj) throws 
(1:NoSuchObjectException o1,   
+ 2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)  
+ bool update_partition_column_statistics(1:ColumnStatistics stats_obj) throws 
(1:NoSuchObjectException o1,   
+ 2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
+
+ColumnStatistics get_table_column_statistics(1:string db_name, 2:string 
tbl_name, 3:string col_name) throws  
+ (1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidInputException o3, 
4:InvalidObjectException o4)   
+ ColumnStatistics get_partition_column_statistics(1:string db_name, 2:string 
tbl_name, 3:string part_name,  
+ 4:string col_name) throws (1:NoSuchObjectException o1, 2:MetaException o2,   
+ 3:InvalidInputException o3, 4:InvalidObjectException o4)
+
+bool delete_partition_column_statistics(1:string db_name, 2:string tbl_name, 
3:string part_name, 4:string col_name) throws   
+ (1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3, 
  
+ 4:InvalidInputException o4)  
+ bool delete_table_column_statistics(1:string db_name, 2:string tbl_name, 
3:string col_name) throws   
+ (1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3, 
  
+ 4:InvalidInputException o4)
+</code></pre><p>Note that delete_column_statistics is needed to remove the 
entries from the metastore when a table is dropped. Also note that currently 
Hive doesn’t support drop column.</p><p>Note that in V1 of the project, we will 
support only scalar statistics. Furthermore, we will support only static 
partitions, i.e., both the partition key and partition value should be 
specified in the analyze command. In a following version, we will add support 
for height balanced histograms as well [...]
 <i class="fas fa-thumbs-up"></i> Yes
 </button>
 <button class="btn btn-feedback btn-negative">
diff --git a/development/desingdocs/default-constraint/index.html 
b/development/desingdocs/default-constraint/index.html
index abff6949..12b33791 100644
--- a/development/desingdocs/default-constraint/index.html
+++ b/development/desingdocs/default-constraint/index.html
@@ -3,7 +3,7 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse 
navbar-collapse" id=navbarSupportedContent><ul class="navbar-nav me-auto"><li 
class=nav-item><a class=nav-link 
href=https://hive.apache.org//general/downloads>Releases</a></li><li 
class="nav-item dropdown"><a class="nav-link dropdown-toggle" href=/Document 
id=docsDropdown role=button data-bs-toggle=dropdown 
aria-expanded=false>Documentation</a><ul class=dropdown-menu 
aria-labelledby=docsDropdown><li><a class=dropdown-it [...]
 <button type=submit class=search-button aria-label="Submit search">
 <i class="fas 
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
 class=content><div class=docs-container><main class="docs-main 
docs-main-full"><article class=docs-content><nav 
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i> 
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache 
Hive : Default Constraint (HIVE-18726)</li></ol></nav><header 
class=docs-header><h1 class=docs-title>Apache Hive : Default Constraint (HIV 
[...]
-Last updated: December 12, 2024</span></div></header><div 
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav 
id=TableOfContents><ul><li><a 
href=#apache-hive--default-constraint-hive-18726>Apache Hive : Default 
Constraint (HIVE-18726)</a></li><li><a 
href=#introduction>Introduction</a></li><li><a 
href=#background>Background</a></li><li><a href=#proposed-changes>Proposed 
Changes</a><ul><li><a href=#create-table>Create Table</a></li><li><a 
href=#insert>INSERT</a></li>< [...]
+Last updated: December 12, 2024</span></div></header><div 
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav 
id=TableOfContents><ul><li><a 
href=#apache-hive--default-constraint-hive-18726>Apache Hive : Default 
Constraint (HIVE-18726)</a></li><li><a 
href=#introduction>Introduction</a></li><li><a 
href=#background>Background</a></li><li><a href=#proposed-changes>Proposed 
Changes</a><ul><li><a href=#create-table>Create Table</a></li><li><a 
href=#insert>INSERT</a></li>< [...]
 <i class="fas fa-thumbs-up"></i> Yes
 </button>
 <button class="btn btn-feedback btn-negative">
diff --git a/development/desingdocs/design/index.html 
b/development/desingdocs/design/index.html
index 8e5892b0..cc2de1d2 100644
--- a/development/desingdocs/design/index.html
+++ b/development/desingdocs/design/index.html
@@ -3,7 +3,7 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse 
navbar-collapse" id=navbarSupportedContent><ul class="navbar-nav me-auto"><li 
class=nav-item><a class=nav-link 
href=https://hive.apache.org//general/downloads>Releases</a></li><li 
class="nav-item dropdown"><a class="nav-link dropdown-toggle" href=/Document 
id=docsDropdown role=button data-bs-toggle=dropdown 
aria-expanded=false>Documentation</a><ul class=dropdown-menu 
aria-labelledby=docsDropdown><li><a class=dropdown-it [...]
 <button type=submit class=search-button aria-label="Submit search">
 <i class="fas 
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
 class=content><div class=docs-container><main class="docs-main 
docs-main-full"><article class=docs-content><nav 
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i> 
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache 
Hive : Design</li></ol></nav><header class=docs-header><h1 
class=docs-title>Apache Hive : Design</h1><div class=docs-meta><span class=docs 
[...]
-Last updated: December 12, 2024</span></div></header><div 
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav 
id=TableOfContents><ul><li><a href=#apache-hive--design>Apache Hive : 
Design</a><ul><li><a href=#hive-architecture>Hive Architecture</a></li><li><a 
href=#hive-data-model>Hive Data Model</a></li><li><a 
href=#metastore>Metastore</a><ul><li><a 
href=#motivation>Motivation</a></li><li><a href=#metadata-objects>Metadata 
Objects</a></li><li><a href=#metastore-archi [...]
+Last updated: December 12, 2024</span></div></header><div 
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav 
id=TableOfContents><ul><li><a href=#apache-hive--design>Apache Hive : 
Design</a><ul><li><a href=#hive-architecture>Hive Architecture</a></li><li><a 
href=#hive-data-model>Hive Data Model</a></li><li><a 
href=#metastore>Metastore</a><ul><li><a 
href=#motivation>Motivation</a></li><li><a href=#metadata-objects>Metadata 
Objects</a></li><li><a href=#metastore-archi [...]
 <i class="fas fa-thumbs-up"></i> Yes
 </button>
 <button class="btn btn-feedback btn-negative">
diff --git 
a/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
 
b/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
index a7ad55b7..9f8f7f2e 100644
--- 
a/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
+++ 
b/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
@@ -4,22 +4,43 @@
 <button type=submit class=search-button aria-label="Submit search">
 <i class="fas 
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
 class=content><div class=docs-container><main class="docs-main 
docs-main-full"><article class=docs-content><nav 
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i> 
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache 
Hive : Hbase execution plans for RawStore partition filter 
condition</li></ol></nav><header class=docs-header><h1 class=docs-title>Apache  
[...]
 Last updated: December 12, 2024</span></div></header><div 
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav 
id=TableOfContents><ul><li><a 
href=#apache-hive--hbase-execution-plans-for-rawstore-partition-filter-condition>Apache
 Hive : Hbase execution plans for RawStore partition filter 
condition</a><ul><li><a href=#><img 
src="https://issues.apache.org/jira/secure/viewavatar?size=xsmall&amp;avatarId=21140&amp;avatarType=issuetype";
 alt>HIVE-9452</a></li><li><a href=#n [...]
-Open</p><p>Functionality needed</p><p>RawStore functions that support 
partition filtering are the following 
-</p><ul><li>getPartitionsByExpr</li><li>getPartitionsByFilter (takes filter 
string as argument, used from hcatalog)</li></ul><p>We need to generate a query 
execution plan in terms of Hbase scan api calls for a given filter 
condition.</p><h2 id=notes-about-the-api-to-be-supported>Notes about the api to 
be supported</h2><p>getPartitionsByExpr - Current partition expression 
evaluatio [...]
-| p1 > 10 and p1 &lt; 20 | Scan(X10+, X20) |
-| p1 = 10 (if single partition column) | Scan(X10, X10+). Optimized? : 
Get(X10) |
-| Similar case as above, if all partition columns are specified | |
-| p1 = 10 (multiple partition column) | Scan(X10, X+) |
-| p1 = 9 or p1 = 10 | merge( get(X9), get(X10)) |
-| p1 > 10 or p1 &lt; 20 | merge(scan(X10, X+), scan(X  ,X20)) |
-| (condition on columns other than first partition column) : condition1 | 
Scan(X, X+).setFilter(genFilter(condition1)) |
-| p1 > 10 and condition1 | scan(X10, X+).setFilter(genFilter(condition1)) |
-| p1 &lt; 20 and condition1 | Scan(X , X20).setFilter(genFilter(condition1)) |
-| p1 > 10 and p1 > 20 and p1 &lt; 30 and p1 &lt; 40 | Scan(X20+, X30) |
-| p1 > 10 and (p1 > 20 or c1 = 5) =>(p1 > 10 and p1 > 20) or (p1 > 10 and c1 
=5) | merge(Scan(X20+, X+), Scan(X10+,X+).setFilter(genFilter(c1 = 5))) |
-| (special case with OR condition, if one of the conditions results in full 
table scan): condition1 or condition2 | 
Scan(X).filter(getCombinedFilter(condition1, condition2) (ie, convert to a full 
table scan with filter) |
-| (general case with OR condition): condition1 or condition2 | merge( 
getResult(condition1), getResult(condition2)) |
-| c1 and (c2 or c3) | (c1 and c2) or (c1 and c3) |
-| (c1 or c2) and (c3 or c4) | (c1 and c3) or (c2 and c3) or (c1 and c4) or (c2 
and c4) |</p><p> </p><p>Relevant classes :</p><p>Input:</p><p>ExpressionTree 
(existing) - TreeNodes for AND/OR expressions. Leaf Node for leaf expressions 
with  =,&lt; &mldr;</p><p>Output:</p><p> public static abstract class 
FilterPlan {</p><p>   abstract FilterPlan and(FilterPlan other);</p><p>   
abstract FilterPlan or(FilterPlan other);</p><p>   abstract List 
getPlans();</p><p> }</p><p>// represents a union  [...]
+Open</p><p>Functionality needed</p><p>RawStore functions that support 
partition filtering are the following 
-</p><ul><li>getPartitionsByExpr</li><li>getPartitionsByFilter (takes filter 
string as argument, used from hcatalog)</li></ul><p>We need to generate a query 
execution plan in terms of Hbase scan api calls for a given filter 
condition.</p><h2 id=notes-about-the-api-to-be-supported>Notes about the api to 
be supported</h2><p>getPartitionsByExpr - Current partition expression 
evaluatio [...]
+    abstract FilterPlan and(FilterPlan other);
+    abstract FilterPlan or(FilterPlan other);
+    abstract List&lt;ScanPlan&gt; getPlans();
+  }
+  
+// represents a union of multiple ScanPlan
+MultiScanPlan extends FilterPlan
+  
+  
+
+ScanPlan extends FilterPlan
+    // represent Scan start
+    private ScanMarker startMarker ;
+    // represent Scan end
+    private ScanMarker endMarker ;
+    private ScanFilter filter;
+  
+public FilterPlan and(FilterPlan other) {
+ // calls this.and(otherScanPlan) on each scan plan in other
+}
+private ScanPlan and(ScanPlan other) {
+   // combines start marker and end marker and filters of this and other
+}
+public FilterPlan or(FilterPlan other) {
+   // just create a new FilterPlan from other, with this additional plan
+}
+  
+
+PartitionFilterGenerator -
+  /**
+   * Visitor for ExpressionTree.
+   * It first generates the ScanPlan for the leaf nodes. The higher level 
nodes are
+   * either AND or OR operations. It then calls FilterPlan.and and 
FilterPlan.or with
+   * the child nodes to generate the plans for higher level nodes.
+   */
+</code></pre><p>Initial implementation: Convert from from ExpressionTree to 
Hbase filter, thereby implementing both getPartitionsByFilter and 
getPartitionsByExpr</p><p>A new custom Filter class implementation needs to be 
created. Filter class implements Writable, and the hbase expression to be 
evaluated is serialized</p><p>We can potentially create the filter directly 
from ExprNodeGenericFuncDesc in case of the new fastpath config is 
set.</p></div><footer class=docs-footer><div class=doc [...]
 <i class="fas fa-thumbs-up"></i> Yes
 </button>
 <button class="btn btn-feedback btn-negative">
diff --git a/index.json b/index.json
index ad346d79..673dbda5 100644
--- a/index.json
+++ b/index.json
@@ -1 +1 @@
-[{"categories":null,"contents":"Apache Hive : Iceberg REST Catalog API backed 
by Hive Metastore Introduction Hive Metastore offers Iceberg REST API endpoints 
for clients native to Apache Iceberg. Consequently, Iceberg users can access 
Iceberg tables via either Hive Metastore Thrift API (using HiveCatalog) or 
Iceberg REST Catalog API.\nBasic configurations You must configure the 
following parameters.\nKey Required? Default Value 
metastore.catalog.servlet.port Yes -1 The port number to whi [...]
\ No newline at end of file
+[{"categories":null,"contents":"Apache Hive : Iceberg REST Catalog API backed 
by Hive Metastore Introduction Hive Metastore offers Iceberg REST API endpoints 
for clients native to Apache Iceberg. Consequently, Iceberg users can access 
Iceberg tables via either Hive Metastore Thrift API (using HiveCatalog) or 
Iceberg REST Catalog API.\nBasic configurations You must configure the 
following parameters.\nKey Required? Default Value 
metastore.catalog.servlet.port Yes -1 The port number to whi [...]
\ No newline at end of file

Reply via email to