[
https://issues.apache.org/jira/browse/PARQUET-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087069#comment-16087069
]
Dapeng Sun edited comment on PARQUET-1059 at 7/14/17 9:15 AM:
--------------------------------------------------------------
Hi [~xhochy],
{quote}
Can you describe a workload where this would bring a significant difference?
{quote}
In my case, the values of column may be incremental or decreasing, but the
change of the adjoining values is very small, so the dictionary IDs may also be
adjoining or near. If the IDs are encoding with Delta, I think it would save
more disk space.
was (Author: dapengsun):
Hi [~xhochy],
{quote}
Can you describe a workload where this would bring a significant difference?
{quote}
In my case, the value column may be incremental or decreasing, but the change
of the adjoining values is very small, so the dictionary IDs may also be
adjoining or near. If the IDs encoding support Delta, I think it would save
more disk space.
> Improve the RLE encoding for Parquet Dictionary IDs
> ---------------------------------------------------
>
> Key: PARQUET-1059
> URL: https://issues.apache.org/jira/browse/PARQUET-1059
> Project: Parquet
> Issue Type: Improvement
> Reporter: Dapeng Sun
>
> The IDs of Parquet Dictionary encoding is using
> {{RunLengthBitPackingHybridEncoder}}.
> RunLengthBitPackingHybridEncoder handles encoding with {{repeat}} and
> {{bitpacking}}, we should improve it with the method likes
> {{DeltaBinaryPackingWriter}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)