[
https://issues.apache.org/jira/browse/HUDI-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-3177:
-----------------------------
Sprint: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24,
Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7, Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22,
Hudi-Sprint-Mar-01 (was: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18,
Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7, Hudi-Sprint-Feb-14,
Hudi-Sprint-Feb-22)
> CREATE INDEX command
> --------------------
>
> Key: HUDI-3177
> URL: https://issues.apache.org/jira/browse/HUDI-3177
> Project: Apache Hudi
> Issue Type: Task
> Components: index, metadata
> Reporter: Sagar Sumit
> Assignee: Sagar Sumit
> Priority: Blocker
> Fix For: 0.11.0
>
>
> Users should be able to trigger index creation using CREATE INDEX statement
> or a CLI tool by capturing below options for one or more partitions.
>
> {code:java}
> CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE [table_name] FOR
> COLUMNS (col1, col2, col3) WITH OPTION (<file_group_count>,
> <some_other_option>);{code}
>
> Maps to following hudi configs:
> {code:java}
> METADATA_PREFIX + ".index.bloom.filter.file.group.count”
> METADATA_PREFIX + ".index.column.stats.file.group.count"
> METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column
> names
> METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column
> names{code}
> Even the CLI indexer tool will map user inputs to the above configs.
> By default, bloom filter will only be for record key and column stats will be
> for all columns.
> For v0.11.0, our assumption is:
> # Static file group count for all columns.
> # Infer the set of columns that have already been indexed from the MT
> partition layout (see HUDI-3258).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)