Uwe L. Korn created PARQUET-695:
-----------------------------------

             Summary: C++: Better default encoding user experience
                 Key: PARQUET-695
                 URL: https://issues.apache.org/jira/browse/PARQUET-695
             Project: Parquet
          Issue Type: Improvement
            Reporter: Uwe L. Korn


Currently the default encoding is PLAIN. Probably making dictionary encoding 
the default is the best choice and let the user select an alternative encoding 
if the dictionary grows too large.

The interface should be as follows:

 * The user selects on a global and per-column basis if we should attempt 
dictionary encoding a column. The selection if RLE_DICTIONARY or 
PLAIN_DICTIONARY is used in the metadata is hidden from the user.
 * The user specifies a fallback (!= dictionary) encoding that is used if 
either dictionary encoding for a column is not desired or if the dictionary 
grew exceeded its size limit.

As a recap the current implement selects the encoding solely on the encoding 
variable. There is no fallback support implemented if the dictionary grows too 
large. The only magic at the moment is that the user can supply either 
PLAIN_DICTIONARY or RLE_DICTIONARY and the enum that is used in the metadata is 
the one which is suitable for the chosen Parquet version and not the one 
supplied by the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to