[
https://issues.apache.org/jira/browse/HUDI-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sagar Sumit updated HUDI-3177:
------------------------------
Description:
Users should be able to trigger index creation using CREATE INDEX statement for
one or more partitions.
{code:java}
CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE [table_name] FOR
COLUMNS (col1, col2, col3) WITH OPTION (<file_group_count>,
<some_other_option>);{code}
Maps to following hudi configs:
{code:java}
METADATA_PREFIX + ".index.bloom.filter.file.group.count”
METADATA_PREFIX + ".index.column.stats.file.group.count"
METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column
names
METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column
names{code}
Even the CLI indexer tool will map user inputs to the above configs.
By default, bloom filter will only be for record key and column stats will be
for all columns.
was:Users should be able to trigger index creation using CREATE INDEX
statement for one or more partitions.
> Support CREATE INDEX statement
> ------------------------------
>
> Key: HUDI-3177
> URL: https://issues.apache.org/jira/browse/HUDI-3177
> Project: Apache Hudi
> Issue Type: Task
> Components: index, metadata
> Reporter: Sagar Sumit
> Assignee: Sagar Sumit
> Priority: Blocker
> Fix For: 0.11.0
>
>
> Users should be able to trigger index creation using CREATE INDEX statement
> for one or more partitions.
>
> {code:java}
> CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE [table_name] FOR
> COLUMNS (col1, col2, col3) WITH OPTION (<file_group_count>,
> <some_other_option>);{code}
>
> Maps to following hudi configs:
> {code:java}
> METADATA_PREFIX + ".index.bloom.filter.file.group.count”
> METADATA_PREFIX + ".index.column.stats.file.group.count"
> METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column
> names
> METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column
> names{code}
> Even the CLI indexer tool will map user inputs to the above configs.
> By default, bloom filter will only be for record key and column stats will be
> for all columns.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)