Uwe L. Korn created PARQUET-695:
-----------------------------------
Summary: C++: Better default encoding user experience
Key: PARQUET-695
URL: https://issues.apache.org/jira/browse/PARQUET-695
Project: Parquet
Issue Type: Improvement
Reporter: Uwe L. Korn
Currently the default encoding is PLAIN. Probably making dictionary encoding
the default is the best choice and let the user select an alternative encoding
if the dictionary grows too large.
The interface should be as follows:
* The user selects on a global and per-column basis if we should attempt
dictionary encoding a column. The selection if RLE_DICTIONARY or
PLAIN_DICTIONARY is used in the metadata is hidden from the user.
* The user specifies a fallback (!= dictionary) encoding that is used if
either dictionary encoding for a column is not desired or if the dictionary
grew exceeded its size limit.
As a recap the current implement selects the encoding solely on the encoding
variable. There is no fallback support implemented if the dictionary grows too
large. The only magic at the moment is that the user can supply either
PLAIN_DICTIONARY or RLE_DICTIONARY and the enum that is used in the metadata is
the one which is suitable for the chosen Parquet version and not the one
supplied by the user.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)