[
https://issues.apache.org/jira/browse/HUDI-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar closed HUDI-8578.
--------------------------------
Resolution: Fixed
> Standardize SI/FI terminology and syntax
> -----------------------------------------
>
> Key: HUDI-8578
> URL: https://issues.apache.org/jira/browse/HUDI-8578
> Project: Apache Hudi
> Issue Type: Improvement
> Components: metadata
> Reporter: Vinoth Chandar
> Assignee: Lokesh Jain
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Original Estimate: 2m
> Remaining Estimate: 2m
>
> We have to ensure users intuitively understand what we are shipping. Let's
> align with existing terminology of a popular database than inventing our own
> "names".
> Postgres uses "Expression indexes" . and we can change our terminology to be
> the same.
> * References to "Functional Index" in docs/code/RFC changes to "Expression
> Index"
> * "func" becomes "expr" (no need to change syntax with an "expression" clause
> like pg yet.
> * Change anything in storage like prefix "func_index_" or the
> hoodiemetadata.avsc fields or the index defs files. (need to be really really
> thorough)
> Here's how we need the experience to be.
> {code:java}
> CREATE INDEX name ON table (primaryKeyColumn); -- should create RLI
> effectively, alternatively `WHERE primaryKeyColumn = | IN ` has to work with
> RLI is enabled via Streamer/Datasource config. Need to test/confirm this.
> CREATE INDEX name ON table (someOtherColumn); -- should create
> secondary_index backed by RLI, remove secondary_index as an option in using
> clause, if no index type is specified SI is the default.
> CREATE INDEX name ON table USING BLOOM_FILTERS(column) options(expr='lower');
> -- build a bloom filter from lower case version of column
> CREATE INDEX name ON table USING COL_STATS(column)
> options(expr='from_unixtime', format='yyyy-MM-dd'); -- build date based
> indexing of ts column
> CREATE INDEX name ON table USING COL_STATS(column); -- should just add to
> the column stats? (or) should we disallow this for now?
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)