[
https://issues.apache.org/jira/browse/PARQUET-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032465#comment-17032465
]
Walid Gara edited comment on PARQUET-1784 at 2/7/20 3:38 PM:
-------------------------------------------------------------
[~gszadovszky]
I absolutely agree with you, we don't need to follow other projects in the
Hadoop ecosystem.
It seems that *conf.set* better than *conf.setStrings*.
Let me know if you need any anything, I'll be happy to help.
was (Author: garawalid):
I absolutely agree with you, we don't need to follow other projects in the
Hadoop ecosystem.
It seems that *conf.set* better than *conf.setStrings*.
Let me know if you need any anything, I'll be happy to help.
> Column-wise configuration
> -------------------------
>
> Key: PARQUET-1784
> URL: https://issues.apache.org/jira/browse/PARQUET-1784
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Gabor Szadovszky
> Assignee: Gabor Szadovszky
> Priority: Major
> Labels: pull-request-available
>
> After adding some new statistics and encodings into Parquet it is getting
> very hard to be smart and choose the best configs automatically. For example
> for which columns should we save column index and/or bloom-filters? Is it
> worth using dictionary for a column that we know will fall back to another
> encoding?
> The idea of this feature is to allow the library user to fine-tune the
> configuration by setting it column-wise. To support this we extend the
> existing configuration keys by a suffix to identify the related column. (From
> now on we introduce new keys following the same syntax.)
> \{key of the configuration}{{#}}\{column path in the file schema}
> For example: {{parquet.enable.dictionary#column.path.col_1}}
> This jira covers the framework to support the column-wise configuration with
> the implementation of some existing configs where it make sense (e.g.
> {{parquet.enable.dictionary}}). Implementing new configuration is not part of
> this effort.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)