[ 
https://issues.apache.org/jira/browse/HUDI-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra updated HUDI-3177:
----------------------------------
    Sprint: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, 
Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7  (was: Hudi-Sprint-Jan-10, 
Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31)

> CREATE INDEX command
> --------------------
>
>                 Key: HUDI-3177
>                 URL: https://issues.apache.org/jira/browse/HUDI-3177
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: index, metadata
>            Reporter: Sagar Sumit
>            Assignee: Sagar Sumit
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Users should be able to trigger index creation using CREATE INDEX statement 
> or a CLI tool by capturing below options for one or more partitions.
>  
> {code:java}
> CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE  [table_name] FOR 
> COLUMNS (col1, col2, col3) WITH OPTION  (<file_group_count>, 
> <some_other_option>);{code}
>  
> Maps to following hudi configs:
> {code:java}
> METADATA_PREFIX + ".index.bloom.filter.file.group.count” 
> METADATA_PREFIX + ".index.column.stats.file.group.count" 
> METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column 
> names 
> METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column 
> names{code}
> Even the CLI indexer tool will map user inputs to the above configs.
> By default, bloom filter will only be for record key and column stats will be 
> for all columns.
> For v0.11.0, our assumption is:
>  # Static file group count for all columns.
>  # Infer the set of columns that have already been indexed from the MT 
> partition layout (see HUDI-3258).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to