Re: [PR] [HUDI-7089] Add docs for new features in 1.0.0-beta [hudi]

via GitHub Tue, 14 Nov 2023 12:17:58 -0800


codope commented on code in PR #10087:
URL: https://github.com/apache/hudi/pull/10087#discussion_r1393221202



##########
website/docs/sql_ddl.md:
##########
@@ -178,6 +178,145 @@ TBLPROPERTIES (
 AS SELECT * FROM parquet_table;
 ```
 
+### Create Index
+
+Hudi supports creating and dropping indexes, including functional indexes, on 
a table.
+
+:::note
+Creating indexes through SQL is in preview in version 1.0.0-beta only. It will 
be generally available in version 1.0.0.
+Please report any issues you find either via [GitHub 
issues](https://github.com/apache/hudi/issues) or creating a 
[JIRA](https://issues.apache.org/jira/projects/HUDI/issues).
+:::
+
+```sql
+-- Create Index
+CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name 
+[USING index_type] 
+(column_name1 [OPTIONS(key1=value1, key2=value2, ...)], column_name2 
[OPTIONS(key1=value1, key2=value2, ...)], ...) 
+[OPTIONS (key1=value1, key2=value2, ...)]
+
+-- Drop Index
+DROP INDEX [IF EXISTS] index_name ON [TABLE] table_name
+```
+
+- `index_name` is the name of the index to be created or dropped.
+- `table_name` is the name of the table on which the index is created or 
dropped.
+- `index_type` is the type of the index to be created. Currently, only 
`files`, `column_stats` and `bloom_filters` is supported.
+- `column_name` is the name of the column on which the index is created.
+- Both index and column on which the index is created can be qualified with 
some options in the form of key-value pairs.
+  We will see this with an example of functional index below. 
+
+#### Create Functional Index
+
+A [functional 
index](https://github.com/apache/hudi/blob/00ece7bce0a4a8d0019721a28049723821e01842/rfc/rfc-63/rfc-63.md)
 
+is an index on a function of a column. It is a new addition to Hudi's 
[multi-modal 
indexing](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 
+subsystem which provides faster access method and also absorb partitioning as 
part of the indexing system. Let us see 
+some examples of creating a functional index.
+
+```sql
+-- Create a functional index on the column `ts` (unix epoch) of the table 
`hudi_table` using the function `from_unixtime` with the format `yyyy-MM-dd`
+CREATE INDEX IF NOT EXISTS ts_datestr ON hudi_table USING column_stats(ts) 
OPTIONS(func='from_unixtime', format='yyyy-MM-dd');
+-- Create a functional index on the column `ts` (timestamp in yyyy-MM-dd 
HH:mm:ss) of the table `hudi_table` using the function `hour`
+CREATE INDEX ts_hour ON hudi_table_func_index USING column_stats(ts) 
options(func='hour');
+```
+
+Few things to note:
+- The `func` option is required for creating functional index, and it should 
be a valid Spark SQL function. Currently, 
+  only the functions that take a single column as input are supported. Some 
useful functions that are supported are listed below.
+  - `identity`
+  - `from_unixtime`
+  - `date_format`
+  - `to_date`
+  - `to_timestamp`
+  - `year`
+  - `month`
+  - `day`
+  - `hour`
+  - `lower`
+  - `upper`
+  - `substring`
+  - `regexp_extract`
+  - `regexp_replace`
+  - `concat`
+  - `length`
+- Please check the syntax for the above functions in
+  the [Spark SQL 
documentation](https://spark.apache.org/docs/latest/sql-ref-functions.html) and 
provide the options
+  accordingly. For example, the `format` option is required for 
`from_unixtime` function.
+- UDFs are not supported.
+
+<details>
+  <summary>Example of creating and using functional index</summary>
+
+```sql
+DROP TABLE IF EXISTS hudi_table_func_index;

Review Comment:
   Actually `hudi_table_func_index` is the name of the table and not the index.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7089] Add docs for new features in 1.0.0-beta [hudi]

Reply via email to