[ 
https://issues.apache.org/jira/browse/PARQUET-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031377#comment-17031377
 ] 

Gabor Szadovszky commented on PARQUET-1784:
-------------------------------------------

[~garawalid],

The idea is to use a "root" key for the configuration and add a specific 
{{#column.path}} suffix to the key to set the configuration for the related 
column only. So, to turn the bloom filters on for all the columns would be
{code:java}
conf.set("parquet.bloom.filter", true); 
{code}
and to turn it off for a specific column:
{code:java}
conf.set("parquet.bloom.filter#column.path", false);
{code}

> Column-wise configuration
> -------------------------
>
>                 Key: PARQUET-1784
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1784
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>
> After adding some new statistics and encodings into Parquet it is getting 
> very hard to be smart and choose the best configs automatically. For example 
> for which columns should we save column index and/or bloom-filters? Is it 
> worth using dictionary for a column that we know will fall back to another 
> encoding?
> The idea of this feature is to allow the library user to fine-tune the 
> configuration by setting it column-wise. To support this we extend the 
> existing configuration keys by a suffix to identify the related column. (From 
> now on we introduce new keys following the same syntax.)
>  \{key of the configuration}{{#}}\{column path in the file schema}
>  For example: {{parquet.enable.dictionary#column.path.col_1}}
> This jira covers the framework to support the column-wise configuration with 
> the implementation of some existing configs where it make sense (e.g. 
> {{parquet.enable.dictionary}}). Implementing new configuration is not part of 
> this effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to