Gabor Szadovszky created PARQUET-1784:
-----------------------------------------
Summary: Column-wise configuration
Key: PARQUET-1784
URL: https://issues.apache.org/jira/browse/PARQUET-1784
Project: Parquet
Issue Type: New Feature
Components: parquet-mr
Reporter: Gabor Szadovszky
Assignee: Gabor Szadovszky
After adding some new statistics and encodings into Parquet it is getting very
hard to be smart and choose the best configs automatically. For example for
which columns should we save column index and/or bloom-filters? Is it worth
using dictionary for a column that we know will fall back to another encoding?
The idea of this feature is to allow the library user to fine-tune the
configuration by setting it column-wise. To support this we extend the existing
configuration keys by a suffix to identify the related column. (From now on we
introduce new keys following the same syntax.)
\{key of the configuration}{{#}}{column path or column index in the projection}
For example: {{parquet.enable.dictionary#column.path.col_1}} or
{{parquet.enable.dictionary#3}}
This jira covers the framework to support the column-wise configuration with
the implementation of some existing configs where it make sense (e.g.
{{parquet.enable.dictionary}}). Implementing new configuration is not part of
this effort.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)