[ 
https://issues.apache.org/jira/browse/PARQUET-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032465#comment-17032465
 ] 

Walid Gara edited comment on PARQUET-1784 at 2/7/20 3:38 PM:
-------------------------------------------------------------

[~gszadovszky]

I absolutely agree with you, we don't need to follow other projects in the 
Hadoop ecosystem.

It seems that *conf.set* better than *conf.setStrings*. 

Let me know if you need any anything, I'll be happy to help. 


was (Author: garawalid):
I absolutely agree with you, we don't need to follow other projects in the 
Hadoop ecosystem.

It seems that *conf.set* better than *conf.setStrings*. 

Let me know if you need any anything, I'll be happy to help. 

> Column-wise configuration
> -------------------------
>
>                 Key: PARQUET-1784
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1784
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>              Labels: pull-request-available
>
> After adding some new statistics and encodings into Parquet it is getting 
> very hard to be smart and choose the best configs automatically. For example 
> for which columns should we save column index and/or bloom-filters? Is it 
> worth using dictionary for a column that we know will fall back to another 
> encoding?
> The idea of this feature is to allow the library user to fine-tune the 
> configuration by setting it column-wise. To support this we extend the 
> existing configuration keys by a suffix to identify the related column. (From 
> now on we introduce new keys following the same syntax.)
>  \{key of the configuration}{{#}}\{column path in the file schema}
>  For example: {{parquet.enable.dictionary#column.path.col_1}}
> This jira covers the framework to support the column-wise configuration with 
> the implementation of some existing configs where it make sense (e.g. 
> {{parquet.enable.dictionary}}). Implementing new configuration is not part of 
> this effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to