[
https://issues.apache.org/jira/browse/PARQUET-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe L. Korn updated PARQUET-695:
--------------------------------
Component/s: parquet-cpp
> C++: Better default encoding user experience
> --------------------------------------------
>
> Key: PARQUET-695
> URL: https://issues.apache.org/jira/browse/PARQUET-695
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Uwe L. Korn
>
> Currently the default encoding is PLAIN. Probably making dictionary encoding
> the default is the best choice and let the user select an alternative
> encoding if the dictionary grows too large.
> The interface should be as follows:
> * The user selects on a global and per-column basis if we should attempt
> dictionary encoding a column. The selection if RLE_DICTIONARY or
> PLAIN_DICTIONARY is used in the metadata is hidden from the user.
> * The user specifies a fallback (!= dictionary) encoding that is used if
> either dictionary encoding for a column is not desired or if the dictionary
> grew exceeded its size limit.
> As a recap the current implement selects the encoding solely on the encoding
> variable. There is no fallback support implemented if the dictionary grows
> too large. The only magic at the moment is that the user can supply either
> PLAIN_DICTIONARY or RLE_DICTIONARY and the enum that is used in the metadata
> is the one which is suitable for the chosen Parquet version and not the one
> supplied by the user.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)