This is an automated email from the ASF dual-hosted git repository.
okumin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hive-site.git
The following commit(s) were added to refs/heads/main by this push:
new 5c9a634e Fix some "Raw HTML omitted" warnings and formatting issues
(#97)
5c9a634e is described below
commit 5c9a634ef73277d23e7ce7537cdd8a425877cd12
Author: Thomas Rebele <[email protected]>
AuthorDate: Tue Feb 24 13:18:12 2026 +0100
Fix some "Raw HTML omitted" warnings and formatting issues (#97)
---
.../desingdocs/column-statistics-in-hive.md | 7 +++-
.../Development/desingdocs/default-constraint.md | 6 ++--
content/Development/desingdocs/design.md | 2 +-
...lans-for-rawstore-partition-filter-condition.md | 38 ++--------------------
4 files changed, 13 insertions(+), 40 deletions(-)
diff --git a/content/Development/desingdocs/column-statistics-in-hive.md
b/content/Development/desingdocs/column-statistics-in-hive.md
index e6a9c3f1..50230987 100644
--- a/content/Development/desingdocs/column-statistics-in-hive.md
+++ b/content/Development/desingdocs/column-statistics-in-hive.md
@@ -34,6 +34,7 @@ describe formatted [table_name] [column_name];
To persist column level statistics, we propose to add the following new tables,
+```
CREATE TABLE TAB_COL_STATS
(
CS_ID NUMBER NOT NULL,
@@ -87,11 +88,13 @@ AVG_COL_LEN DOUBLE,
ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY
(CS_ID);
ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY
(PART_ID) REFERENCES PARTITIONS (PART_ID) INITIALLY DEFERRED;
+```
### **Metastore Thrift API**
We propose to add the following Thrift structs to transport column statistics:
+```
struct BooleanColumnStatsData {
1: required i64 numTrues,
2: required i64 numFalses,
@@ -185,9 +188,11 @@ struct ColumnStatistics {
1: required ColumnStatisticsDesc statsDesc,
2: required list<ColumnStatisticsObj> statsObj;
}
+```
We propose to add the following Thrift APIs to persist, retrieve and delete
column statistics:
+```
bool update_table_column_statistics(1:ColumnStatistics stats_obj) throws
(1:NoSuchObjectException o1,
2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
bool update_partition_column_statistics(1:ColumnStatistics stats_obj) throws
(1:NoSuchObjectException o1,
@@ -205,8 +210,8 @@ bool delete_partition_column_statistics(1:string db_name,
2:string tbl_name, 3:s
bool delete_table_column_statistics(1:string db_name, 2:string tbl_name,
3:string col_name) throws
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
4:InvalidInputException o4)
+```
Note that delete_column_statistics is needed to remove the entries from the
metastore when a table is dropped. Also note that currently Hive doesn’t
support drop column.
Note that in V1 of the project, we will support only scalar statistics.
Furthermore, we will support only static partitions, i.e., both the partition
key and partition value should be specified in the analyze command. In a
following version, we will add support for height balanced histograms as well
as support for dynamic partitions in the analyze command for column level
statistics.
-
diff --git a/content/Development/desingdocs/default-constraint.md
b/content/Development/desingdocs/default-constraint.md
index 54ae4790..ea0aae51 100644
--- a/content/Development/desingdocs/default-constraint.md
+++ b/content/Development/desingdocs/default-constraint.md
@@ -27,10 +27,10 @@ DEFAULT will be a fifth addition to this list. Note that
unlike existing constra
CREATE TABLE will be updated to let user specify DEFAULT as follows:
* With column definition
-+ CREATE TABLE <tableName> (<columnName> <dataType> DEFAULT <defaultValue>)
+ * `CREATE TABLE <tableName> (<columnName> <dataType> DEFAULT <defaultValue>)`
* ~~With constraint specification~~
-+ ~~CREATE TABLE <tableName> (<columnName> <dataType>, …, CONSTRAINT
<constraintName> DEFAULT <defaultValue> (<columnName>)~~
+ * ~~`CREATE TABLE <tableName> (<columnName> <dataType>, …, CONSTRAINT
<constraintName> DEFAULT <defaultValue> (<columnName>)`~~
To be compliant with SQL standards, Hive will only permit default values which
fall in one of the following categories:
@@ -38,7 +38,7 @@ To be compliant with SQL standards, Hive will only permit
default values which f
* DATE TIME VALUE FUNCTION, that is, CURRENT_TIME, CURRENT_DATE
* CURRENT_USER()
* NULL
-* CAST (<expression in above category> as PRIMITIVE TYPE)
+* CAST (<expression in above category> as PRIMITIVE TYPE)
## INSERT
diff --git a/content/Development/desingdocs/design.md
b/content/Development/desingdocs/design.md
index fdb3d841..8b4ca2e1 100644
--- a/content/Development/desingdocs/design.md
+++ b/content/Development/desingdocs/design.md
@@ -27,7 +27,7 @@ Figure 1 also shows how a typical query flows through the
system. The UI calls t
Data in Hive is organized into:
* Tables – These are analogous to Tables in Relational Databases. Tables can
be filtered, projected, joined and unioned. Additionally all the data of a
table is stored in a directory in HDFS. Hive also supports the notion of
external tables wherein a table can be created on prexisting files or
directories in HDFS by providing the appropriate location to the table creation
DDL. The rows in a table are organized into typed columns similar to Relational
Databases.
-* Partitions – Each Table can have one or more partition keys which determine
how the data is stored, for example a table T with a date partition column ds
had files with data for a particular date stored in the <table
location>/ds=<date> directory in HDFS. Partitions allow the system to prune
data to be inspected based on query predicates, for example a query that is
interested in rows from T that satisfy the predicate T.ds = '2008-09-01' would
only have to look at files in <table locat [...]
+* Partitions – Each Table can have one or more partition keys which determine
how the data is stored, for example a table T with a date partition column ds
had files with data for a particular date stored in the `<table
location>/ds=<date>` directory in HDFS. Partitions allow the system to prune
data to be inspected based on query predicates, for example a query that is
interested in rows from T that satisfy the predicate T.ds = '2008-09-01' would
only have to look at files in `<table lo [...]
* Buckets – Data in each partition may in turn be divided into Buckets based
on the hash of a column in the table. Each bucket is stored as a file in the
partition directory. Bucketing allows the system to efficiently evaluate
queries that depend on a sample of data (these are queries that use the SAMPLE
clause on the table).
Apart from primitive column types (integers, floating point numbers, generic
strings, dates and booleans), Hive also supports arrays and maps. Additionally,
users can compose their own types programmatically from any of the primitives,
collections or other user-defined types. The typing system is closely tied to
the SerDe (Serailization/Deserialization) and object inspector interfaces. User
can create their own types by implementing their own object inspectors, and
using these object ins [...]
diff --git
a/content/Development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition.md
b/content/Development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition.md
index 9cef21f3..8aad01c7 100644
---
a/content/Development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition.md
+++
b/content/Development/desingdocs/hbase-execution-plans-for-rawstore-partition-filter-condition.md
@@ -129,6 +129,7 @@ Examples of conversion of query plan to hbase api calls
| Filter expression | HBase calls |
+|-|-|
| p1 > 10 and p1 < 20 | Scan(X10+, X20) |
| p1 = 10 (if single partition column) | Scan(X10, X10+). Optimized? :
Get(X10) |
| Similar case as above, if all partition columns are specified | |
@@ -162,77 +163,44 @@ ExpressionTree (existing) - TreeNodes for AND/OR
expressions. Leaf Node for leaf
Output:
+```
public static abstract class FilterPlan {
-
abstract FilterPlan and(FilterPlan other);
-
abstract FilterPlan or(FilterPlan other);
-
abstract List<ScanPlan> getPlans();
-
}
-
-
// represents a union of multiple ScanPlan
-
MultiScanPlan extends FilterPlan
-
ScanPlan extends FilterPlan
-
// represent Scan start
-
private ScanMarker startMarker ;
-
// represent Scan end
-
private ScanMarker endMarker ;
-
private ScanFilter filter;
-
-
public FilterPlan and(FilterPlan other) {
-
// calls this.and(otherScanPlan) on each scan plan in other
-
}
-
private ScanPlan and(ScanPlan other) {
-
// combines start marker and end marker and filters of this and other
-
}
-
public FilterPlan or(FilterPlan other) {
-
// just create a new FilterPlan from other, with this additional plan
-
}
-
-
PartitionFilterGenerator -
-
/**
-
* Visitor for ExpressionTree.
-
* It first generates the ScanPlan for the leaf nodes. The higher level
nodes are
-
* either AND or OR operations. It then calls FilterPlan.and and
FilterPlan.or with
-
* the child nodes to generate the plans for higher level nodes.
-
*/
-
-
-
-
+```
Initial implementation: Convert from from ExpressionTree to Hbase filter,
thereby implementing both getPartitionsByFilter and getPartitionsByExpr