[ 
https://issues.apache.org/jira/browse/HUDI-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-3177:
------------------------------
    Description: 
Users should be able to trigger index creation using CREATE INDEX statement or 
a CLI tool by capturing below options for one or more partitions.
 
{code:java}
CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE  [table_name] FOR 
COLUMNS (col1, col2, col3) WITH OPTION  (<file_group_count>, 
<some_other_option>);{code}
 
Maps to following hudi configs:
{code:java}
METADATA_PREFIX + ".index.bloom.filter.file.group.count” 
METADATA_PREFIX + ".index.column.stats.file.group.count" 
METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column 
names 
METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column 
names{code}
Even the CLI indexer tool will map user inputs to the above configs.
By default, bloom filter will only be for record key and column stats will be 
for all columns.

For v0.11.0, our assumption is:
 # Static file group count for all columns.
 # Infer the set of columns that have already been indexed from the MT 
partition layout.

  was:
Users should be able to trigger index creation using CREATE INDEX statement for 
one or more partitions.
 
{code:java}
CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE  [table_name] FOR 
COLUMNS (col1, col2, col3) WITH OPTION  (<file_group_count>, 
<some_other_option>);{code}
 
Maps to following hudi configs:
{code:java}
METADATA_PREFIX + ".index.bloom.filter.file.group.count” 
METADATA_PREFIX + ".index.column.stats.file.group.count" 
METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column 
names 
METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column 
names{code}
Even the CLI indexer tool will map user inputs to the above configs.
By default, bloom filter will only be for record key and column stats will be 
for all columns.


> CREATE INDEX command
> --------------------
>
>                 Key: HUDI-3177
>                 URL: https://issues.apache.org/jira/browse/HUDI-3177
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: index, metadata
>            Reporter: Sagar Sumit
>            Assignee: Sagar Sumit
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Users should be able to trigger index creation using CREATE INDEX statement 
> or a CLI tool by capturing below options for one or more partitions.
>  
> {code:java}
> CREATE [BLOOM | COL_STATS | SOME_INDEX_TYPE] INDEX ON TABLE  [table_name] FOR 
> COLUMNS (col1, col2, col3) WITH OPTION  (<file_group_count>, 
> <some_other_option>);{code}
>  
> Maps to following hudi configs:
> {code:java}
> METADATA_PREFIX + ".index.bloom.filter.file.group.count” 
> METADATA_PREFIX + ".index.column.stats.file.group.count" 
> METADATA_PREFIX + ".index.bloom.filter.for.columns” -> comma-separated column 
> names 
> METADATA_PREFIX + ".index.column.stats.for.columns" -> comma-separated column 
> names{code}
> Even the CLI indexer tool will map user inputs to the above configs.
> By default, bloom filter will only be for record key and column stats will be 
> for all columns.
> For v0.11.0, our assumption is:
>  # Static file group count for all columns.
>  # Infer the set of columns that have already been indexed from the MT 
> partition layout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to