[ 
https://issues.apache.org/jira/browse/PARQUET-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated PARQUET-695:
--------------------------------
    Component/s: parquet-cpp

> C++: Better default encoding user experience
> --------------------------------------------
>
>                 Key: PARQUET-695
>                 URL: https://issues.apache.org/jira/browse/PARQUET-695
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Uwe L. Korn
>
> Currently the default encoding is PLAIN. Probably making dictionary encoding 
> the default is the best choice and let the user select an alternative 
> encoding if the dictionary grows too large.
> The interface should be as follows:
>  * The user selects on a global and per-column basis if we should attempt 
> dictionary encoding a column. The selection if RLE_DICTIONARY or 
> PLAIN_DICTIONARY is used in the metadata is hidden from the user.
>  * The user specifies a fallback (!= dictionary) encoding that is used if 
> either dictionary encoding for a column is not desired or if the dictionary 
> grew exceeded its size limit.
> As a recap the current implement selects the encoding solely on the encoding 
> variable. There is no fallback support implemented if the dictionary grows 
> too large. The only magic at the moment is that the user can supply either 
> PLAIN_DICTIONARY or RLE_DICTIONARY and the enum that is used in the metadata 
> is the one which is suitable for the chosen Parquet version and not the one 
> supplied by the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to