This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hive-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 610f3437 deploy: 5c9a634ef73277d23e7ce7537cdd8a425877cd12
610f3437 is described below
commit 610f3437acf88830689ccfe3b18b3438047afaa1
Author: okumin <[email protected]>
AuthorDate: Tue Feb 24 12:18:41 2026 +0000
deploy: 5c9a634ef73277d23e7ce7537cdd8a425877cd12
---
.../column-statistics-in-hive/index.html | 165 ++++++++++++++++++++-
.../desingdocs/default-constraint/index.html | 2 +-
development/desingdocs/design/index.html | 2 +-
.../index.html | 53 +++++--
index.json | 2 +-
5 files changed, 204 insertions(+), 20 deletions(-)
diff --git a/development/desingdocs/column-statistics-in-hive/index.html
b/development/desingdocs/column-statistics-in-hive/index.html
index 020e9233..847583dd 100644
--- a/development/desingdocs/column-statistics-in-hive/index.html
+++ b/development/desingdocs/column-statistics-in-hive/index.html
@@ -4,7 +4,170 @@
<button type=submit class=search-button aria-label="Submit search">
<i class="fas
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
class=content><div class=docs-container><main class="docs-main
docs-main-full"><article class=docs-content><nav
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i>
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache
Hive : Column Statistics in Hive</li></ol></nav><header class=docs-header><h1
class=docs-title>Apache Hive : Column Statistics in Hive</h1 [...]
Last updated: December 12, 2024</span></div></header><div
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav
id=TableOfContents><ul><li><a
href=#apache-hive--column-statistics-in-hive>Apache Hive : Column Statistics in
Hive</a><ul><li><ul><li><a
href=#introduction><strong>Introduction</strong></a></li><li><a
href=#hiveql-changes><strong>HiveQL changes</strong></a></li><li><a
href=#metastore-schema><strong>Metastore Schema</strong></a></li><li><a
href=#metastore-thr [...]
-</code></pre><h3 id=metastore-schema><strong>Metastore
Schema</strong></h3><p>To persist column level statistics, we propose to add
the following new tables,</p><p>CREATE TABLE TAB_COL_STATS<br>(<br>CS_ID NUMBER
NOT NULL,<br>TBL_ID NUMBER NOT NULL,<br>COLUMN_NAME VARCHAR(128) NOT
NULL,<br>COLUMN_TYPE VARCHAR(128) NOT NULL,<br>TABLE_NAME VARCHAR(128) NOT
NULL,<br>DB_NAME VARCHAR(128) NOT NULL,</p><p>LOW_VALUE RAW,<br>HIGH_VALUE
RAW,<br>NUM_NULLS BIGINT,<br>NUM_DISTINCTS BIGINT,</p><p>BIT_ [...]
+</code></pre><h3 id=metastore-schema><strong>Metastore
Schema</strong></h3><p>To persist column level statistics, we propose to add
the following new tables,</p><pre tabindex=0><code>CREATE TABLE TAB_COL_STATS
+ (
+ CS_ID NUMBER NOT NULL,
+ TBL_ID NUMBER NOT NULL,
+ COLUMN_NAME VARCHAR(128) NOT NULL,
+ COLUMN_TYPE VARCHAR(128) NOT NULL,
+ TABLE_NAME VARCHAR(128) NOT NULL,
+ DB_NAME VARCHAR(128) NOT NULL,
+
+LOW_VALUE RAW,
+ HIGH_VALUE RAW,
+ NUM_NULLS BIGINT,
+ NUM_DISTINCTS BIGINT,
+
+BIT_VECTOR, BLOB, /* introduced in
[HIVE-16997](https://issues.apache.org/jira/browse/HIVE-16997) in Hive 3.0.0 */
+
+AVG_COL_LEN DOUBLE,
+ MAX_COL_LEN BIGINT,
+ NUM_TRUES BIGINT,
+ NUM_FALSES BIGINT,
+ LAST_ANALYZED BIGINT NOT NULL)
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY
(CS_ID);
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY
(TBL_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;
+
+CREATE TABLE PART_COL_STATS
+ (
+ CS_ID NUMBER NOT NULL,
+ PART_ID NUMBER NOT NULL,
+
+DB_NAME VARCHAR(128) NOT NULL,
+ COLUMN_NAME VARCHAR(128) NOT NULL,
+ COLUMN_TYPE VARCHAR(128) NOT NULL,
+ TABLE_NAME VARCHAR(128) NOT NULL,
+ PART_NAME VARCHAR(128) NOT NULL,
+
+LOW_VALUE RAW,
+ HIGH_VALUE RAW,
+ NUM_NULLS BIGINT,
+ NUM_DISTINCTS BIGINT,
+
+BIT_VECTOR, BLOB, /* introduced in
[HIVE-16997](https://issues.apache.org/jira/browse/HIVE-16997) in Hive 3.0.0 */
+
+AVG_COL_LEN DOUBLE,
+ MAX_COL_LEN BIGINT,
+ NUM_TRUES BIGINT,
+ NUM_FALSES BIGINT,
+ LAST_ANALYZED BIGINT NOT NULL)
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY
(CS_ID);
+
+ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY
(PART_ID) REFERENCES PARTITIONS (PART_ID) INITIALLY DEFERRED;
+</code></pre><h3 id=metastore-thrift-api><strong>Metastore Thrift
API</strong></h3><p>We propose to add the following Thrift structs to transport
column statistics:</p><pre tabindex=0><code>struct BooleanColumnStatsData {
+ 1: required i64 numTrues,
+ 2: required i64 numFalses,
+ 3: required i64 numNulls
+ }
+
+struct DoubleColumnStatsData {
+ 1: required double lowValue,
+ 2: required double highValue,
+ 3: required i64 numNulls,
+ 4: required i64 numDVs,
+
+5: optional string bitVectors
+
+}
+
+struct LongColumnStatsData {
+ 1: required i64 lowValue,
+ 2: required i64 highValue,
+ 3: required i64 numNulls,
+ 4: required i64 numDVs,
+
+5: optional string bitVectors
+ }
+
+struct StringColumnStatsData {
+ 1: required i64 maxColLen,
+ 2: required double avgColLen,
+ 3: required i64 numNulls,
+ 4: required i64 numDVs,
+
+5: optional string bitVectors
+ }
+
+struct BinaryColumnStatsData {
+ 1: required i64 maxColLen,
+ 2: required double avgColLen,
+ 3: required i64 numNulls
+ }
+
+struct Decimal {
+1: required binary unscaled,
+3: required i16 scale
+}
+
+struct DecimalColumnStatsData {
+1: optional Decimal lowValue,
+2: optional Decimal highValue,
+3: required i64 numNulls,
+4: required i64 numDVs,
+5: optional string bitVectors
+}
+
+struct Date {
+1: required i64 daysSinceEpoch
+}
+
+struct DateColumnStatsData {
+1: optional Date lowValue,
+2: optional Date highValue,
+3: required i64 numNulls,
+4: required i64 numDVs,
+5: optional string bitVectors
+}
+
+union ColumnStatisticsData {
+1: BooleanColumnStatsData booleanStats,
+2: LongColumnStatsData longStats,
+3: DoubleColumnStatsData doubleStats,
+4: StringColumnStatsData stringStats,
+5: BinaryColumnStatsData binaryStats,
+6: DecimalColumnStatsData decimalStats,
+7: DateColumnStatsData dateStats
+}
+
+struct ColumnStatisticsObj {
+ 1: required string colName,
+ 2: required string colType,
+ 3: required ColumnStatisticsData statsData
+ }
+
+struct ColumnStatisticsDesc {
+ 1: required bool isTblLevel,
+ 2: required string dbName,
+ 3: required string tableName,
+ 4: optional string partName,
+ 5: optional i64 lastAnalyzed
+ }
+
+struct ColumnStatistics {
+ 1: required ColumnStatisticsDesc statsDesc,
+ 2: required list<ColumnStatisticsObj> statsObj;
+ }
+</code></pre><p>We propose to add the following Thrift APIs to persist,
retrieve and delete column statistics:</p><pre tabindex=0><code>bool
update_table_column_statistics(1:ColumnStatistics stats_obj) throws
(1:NoSuchObjectException o1,
+ 2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
+ bool update_partition_column_statistics(1:ColumnStatistics stats_obj) throws
(1:NoSuchObjectException o1,
+ 2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
+
+ColumnStatistics get_table_column_statistics(1:string db_name, 2:string
tbl_name, 3:string col_name) throws
+ (1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidInputException o3,
4:InvalidObjectException o4)
+ ColumnStatistics get_partition_column_statistics(1:string db_name, 2:string
tbl_name, 3:string part_name,
+ 4:string col_name) throws (1:NoSuchObjectException o1, 2:MetaException o2,
+ 3:InvalidInputException o3, 4:InvalidObjectException o4)
+
+bool delete_partition_column_statistics(1:string db_name, 2:string tbl_name,
3:string part_name, 4:string col_name) throws
+ (1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
+ 4:InvalidInputException o4)
+ bool delete_table_column_statistics(1:string db_name, 2:string tbl_name,
3:string col_name) throws
+ (1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
+ 4:InvalidInputException o4)
+</code></pre><p>Note that delete_column_statistics is needed to remove the
entries from the metastore when a table is dropped. Also note that currently
Hive doesn’t support drop column.</p><p>Note that in V1 of the project, we will
support only scalar statistics. Furthermore, we will support only static
partitions, i.e., both the partition key and partition value should be
specified in the analyze command. In a following version, we will add support
for height balanced histograms as well [...]
<i class="fas fa-thumbs-up"></i> Yes
</button>
<button class="btn btn-feedback btn-negative">
diff --git a/development/desingdocs/default-constraint/index.html
b/development/desingdocs/default-constraint/index.html
index abff6949..12b33791 100644
--- a/development/desingdocs/default-constraint/index.html
+++ b/development/desingdocs/default-constraint/index.html
@@ -3,7 +3,7 @@
<span class=navbar-toggler-icon></span></button><div class="collapse
navbar-collapse" id=navbarSupportedContent><ul class="navbar-nav me-auto"><li
class=nav-item><a class=nav-link
href=https://hive.apache.org//general/downloads>Releases</a></li><li
class="nav-item dropdown"><a class="nav-link dropdown-toggle" href=/Document
id=docsDropdown role=button data-bs-toggle=dropdown
aria-expanded=false>Documentation</a><ul class=dropdown-menu
aria-labelledby=docsDropdown><li><a class=dropdown-it [...]
<button type=submit class=search-button aria-label="Submit search">
<i class="fas
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
class=content><div class=docs-container><main class="docs-main
docs-main-full"><article class=docs-content><nav
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i>
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache
Hive : Default Constraint (HIVE-18726)</li></ol></nav><header
class=docs-header><h1 class=docs-title>Apache Hive : Default Constraint (HIV
[...]
-Last updated: December 12, 2024</span></div></header><div
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav
id=TableOfContents><ul><li><a
href=#apache-hive--default-constraint-hive-18726>Apache Hive : Default
Constraint (HIVE-18726)</a></li><li><a
href=#introduction>Introduction</a></li><li><a
href=#background>Background</a></li><li><a href=#proposed-changes>Proposed
Changes</a><ul><li><a href=#create-table>Create Table</a></li><li><a
href=#insert>INSERT</a></li>< [...]
+Last updated: December 12, 2024</span></div></header><div
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav
id=TableOfContents><ul><li><a
href=#apache-hive--default-constraint-hive-18726>Apache Hive : Default
Constraint (HIVE-18726)</a></li><li><a
href=#introduction>Introduction</a></li><li><a
href=#background>Background</a></li><li><a href=#proposed-changes>Proposed
Changes</a><ul><li><a href=#create-table>Create Table</a></li><li><a
href=#insert>INSERT</a></li>< [...]
<i class="fas fa-thumbs-up"></i> Yes
</button>
<button class="btn btn-feedback btn-negative">
diff --git a/development/desingdocs/design/index.html
b/development/desingdocs/design/index.html
index 8e5892b0..cc2de1d2 100644
--- a/development/desingdocs/design/index.html
+++ b/development/desingdocs/design/index.html
@@ -3,7 +3,7 @@
<span class=navbar-toggler-icon></span></button><div class="collapse
navbar-collapse" id=navbarSupportedContent><ul class="navbar-nav me-auto"><li
class=nav-item><a class=nav-link
href=https://hive.apache.org//general/downloads>Releases</a></li><li
class="nav-item dropdown"><a class="nav-link dropdown-toggle" href=/Document
id=docsDropdown role=button data-bs-toggle=dropdown
aria-expanded=false>Documentation</a><ul class=dropdown-menu
aria-labelledby=docsDropdown><li><a class=dropdown-it [...]
<button type=submit class=search-button aria-label="Submit search">
<i class="fas
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
class=content><div class=docs-container><main class="docs-main
docs-main-full"><article class=docs-content><nav
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i>
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache
Hive : Design</li></ol></nav><header class=docs-header><h1
class=docs-title>Apache Hive : Design</h1><div class=docs-meta><span class=docs
[...]
-Last updated: December 12, 2024</span></div></header><div
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav
id=TableOfContents><ul><li><a href=#apache-hive--design>Apache Hive :
Design</a><ul><li><a href=#hive-architecture>Hive Architecture</a></li><li><a
href=#hive-data-model>Hive Data Model</a></li><li><a
href=#metastore>Metastore</a><ul><li><a
href=#motivation>Motivation</a></li><li><a href=#metadata-objects>Metadata
Objects</a></li><li><a href=#metastore-archi [...]
+Last updated: December 12, 2024</span></div></header><div
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav
id=TableOfContents><ul><li><a href=#apache-hive--design>Apache Hive :
Design</a><ul><li><a href=#hive-architecture>Hive Architecture</a></li><li><a
href=#hive-data-model>Hive Data Model</a></li><li><a
href=#metastore>Metastore</a><ul><li><a
href=#motivation>Motivation</a></li><li><a href=#metadata-objects>Metadata
Objects</a></li><li><a href=#metastore-archi [...]
<i class="fas fa-thumbs-up"></i> Yes
</button>
<button class="btn btn-feedback btn-negative">
diff --git
a/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
b/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
index a7ad55b7..9f8f7f2e 100644
---
a/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
+++
b/development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition/index.html
@@ -4,22 +4,43 @@
<button type=submit class=search-button aria-label="Submit search">
<i class="fas
fa-search"></i></button></div></form></div></div></div></nav></menu></header><div
class=content><div class=docs-container><main class="docs-main
docs-main-full"><article class=docs-content><nav
class=docs-breadcrumb><ol><li><a href=/><i class="fas fa-home"></i>
Home</a></li><li><a href=/docs/>Documentation</a></li><li class=active>Apache
Hive : Hbase execution plans for RawStore partition filter
condition</li></ol></nav><header class=docs-header><h1 class=docs-title>Apache
[...]
Last updated: December 12, 2024</span></div></header><div
class=docs-toc><h4><i class="fas fa-list"></i> Table of Contents</h4><nav
id=TableOfContents><ul><li><a
href=#apache-hive--hbase-execution-plans-for-rawstore-partition-filter-condition>Apache
Hive : Hbase execution plans for RawStore partition filter
condition</a><ul><li><a href=#><img
src="https://issues.apache.org/jira/secure/viewavatar?size=xsmall&avatarId=21140&avatarType=issuetype"
alt>HIVE-9452</a></li><li><a href=#n [...]
-Open</p><p>Functionality needed</p><p>RawStore functions that support
partition filtering are the following
-</p><ul><li>getPartitionsByExpr</li><li>getPartitionsByFilter (takes filter
string as argument, used from hcatalog)</li></ul><p>We need to generate a query
execution plan in terms of Hbase scan api calls for a given filter
condition.</p><h2 id=notes-about-the-api-to-be-supported>Notes about the api to
be supported</h2><p>getPartitionsByExpr - Current partition expression
evaluatio [...]
-| p1 > 10 and p1 < 20 | Scan(X10+, X20) |
-| p1 = 10 (if single partition column) | Scan(X10, X10+). Optimized? :
Get(X10) |
-| Similar case as above, if all partition columns are specified | |
-| p1 = 10 (multiple partition column) | Scan(X10, X+) |
-| p1 = 9 or p1 = 10 | merge( get(X9), get(X10)) |
-| p1 > 10 or p1 < 20 | merge(scan(X10, X+), scan(X ,X20)) |
-| (condition on columns other than first partition column) : condition1 |
Scan(X, X+).setFilter(genFilter(condition1)) |
-| p1 > 10 and condition1 | scan(X10, X+).setFilter(genFilter(condition1)) |
-| p1 < 20 and condition1 | Scan(X , X20).setFilter(genFilter(condition1)) |
-| p1 > 10 and p1 > 20 and p1 < 30 and p1 < 40 | Scan(X20+, X30) |
-| p1 > 10 and (p1 > 20 or c1 = 5) =>(p1 > 10 and p1 > 20) or (p1 > 10 and c1
=5) | merge(Scan(X20+, X+), Scan(X10+,X+).setFilter(genFilter(c1 = 5))) |
-| (special case with OR condition, if one of the conditions results in full
table scan): condition1 or condition2 |
Scan(X).filter(getCombinedFilter(condition1, condition2) (ie, convert to a full
table scan with filter) |
-| (general case with OR condition): condition1 or condition2 | merge(
getResult(condition1), getResult(condition2)) |
-| c1 and (c2 or c3) | (c1 and c2) or (c1 and c3) |
-| (c1 or c2) and (c3 or c4) | (c1 and c3) or (c2 and c3) or (c1 and c4) or (c2
and c4) |</p><p> </p><p>Relevant classes :</p><p>Input:</p><p>ExpressionTree
(existing) - TreeNodes for AND/OR expressions. Leaf Node for leaf expressions
with =,< …</p><p>Output:</p><p> public static abstract class
FilterPlan {</p><p> abstract FilterPlan and(FilterPlan other);</p><p>
abstract FilterPlan or(FilterPlan other);</p><p> abstract List
getPlans();</p><p> }</p><p>// represents a union [...]
+Open</p><p>Functionality needed</p><p>RawStore functions that support
partition filtering are the following
-</p><ul><li>getPartitionsByExpr</li><li>getPartitionsByFilter (takes filter
string as argument, used from hcatalog)</li></ul><p>We need to generate a query
execution plan in terms of Hbase scan api calls for a given filter
condition.</p><h2 id=notes-about-the-api-to-be-supported>Notes about the api to
be supported</h2><p>getPartitionsByExpr - Current partition expression
evaluatio [...]
+ abstract FilterPlan and(FilterPlan other);
+ abstract FilterPlan or(FilterPlan other);
+ abstract List<ScanPlan> getPlans();
+ }
+
+// represents a union of multiple ScanPlan
+MultiScanPlan extends FilterPlan
+
+
+
+ScanPlan extends FilterPlan
+ // represent Scan start
+ private ScanMarker startMarker ;
+ // represent Scan end
+ private ScanMarker endMarker ;
+ private ScanFilter filter;
+
+public FilterPlan and(FilterPlan other) {
+ // calls this.and(otherScanPlan) on each scan plan in other
+}
+private ScanPlan and(ScanPlan other) {
+ // combines start marker and end marker and filters of this and other
+}
+public FilterPlan or(FilterPlan other) {
+ // just create a new FilterPlan from other, with this additional plan
+}
+
+
+PartitionFilterGenerator -
+ /**
+ * Visitor for ExpressionTree.
+ * It first generates the ScanPlan for the leaf nodes. The higher level
nodes are
+ * either AND or OR operations. It then calls FilterPlan.and and
FilterPlan.or with
+ * the child nodes to generate the plans for higher level nodes.
+ */
+</code></pre><p>Initial implementation: Convert from from ExpressionTree to
Hbase filter, thereby implementing both getPartitionsByFilter and
getPartitionsByExpr</p><p>A new custom Filter class implementation needs to be
created. Filter class implements Writable, and the hbase expression to be
evaluated is serialized</p><p>We can potentially create the filter directly
from ExprNodeGenericFuncDesc in case of the new fastpath config is
set.</p></div><footer class=docs-footer><div class=doc [...]
<i class="fas fa-thumbs-up"></i> Yes
</button>
<button class="btn btn-feedback btn-negative">
diff --git a/index.json b/index.json
index ad346d79..673dbda5 100644
--- a/index.json
+++ b/index.json
@@ -1 +1 @@
-[{"categories":null,"contents":"Apache Hive : Iceberg REST Catalog API backed
by Hive Metastore Introduction Hive Metastore offers Iceberg REST API endpoints
for clients native to Apache Iceberg. Consequently, Iceberg users can access
Iceberg tables via either Hive Metastore Thrift API (using HiveCatalog) or
Iceberg REST Catalog API.\nBasic configurations You must configure the
following parameters.\nKey Required? Default Value
metastore.catalog.servlet.port Yes -1 The port number to whi [...]
\ No newline at end of file
+[{"categories":null,"contents":"Apache Hive : Iceberg REST Catalog API backed
by Hive Metastore Introduction Hive Metastore offers Iceberg REST API endpoints
for clients native to Apache Iceberg. Consequently, Iceberg users can access
Iceberg tables via either Hive Metastore Thrift API (using HiveCatalog) or
Iceberg REST Catalog API.\nBasic configurations You must configure the
following parameters.\nKey Required? Default Value
metastore.catalog.servlet.port Yes -1 The port number to whi [...]
\ No newline at end of file