[ 
https://issues.apache.org/jira/browse/PARQUET-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742672#comment-15742672
 ] 

Ryan Blue commented on PARQUET-796:
-----------------------------------

Dictionary encoding usually produces better results than delta encoding. But, 
the dictionary fall-back is based on what the plain encoding would do, so it is 
biased toward dictionary encoding. Do you think that the data would be smaller 
with delta rather than dictionary?

> Delta Encoding is not used when dictionary enabled
> --------------------------------------------------
>
>                 Key: PARQUET-796
>                 URL: https://issues.apache.org/jira/browse/PARQUET-796
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.9.0
>            Reporter: Jakub Liska
>            Priority: Critical
>             Fix For: 1.9.1
>
>
> Current code doesn't enable using both Delta Encoding and Dictionary 
> Encoding. If I instantiate ParquetWriter like this : 
> {code}
> val writer = new ParquetWriter[Group](outFile, new GroupWriteSupport, codec, 
> blockSize, pageSize, dictPageSize, enableDictionary = true, true, 
> ParquetProperties.WriterVersion.PARQUET_2_0, configuration)
> {code}
> Then this piece of code : 
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultValuesWriterFactory.java#L78-L86
> Causes that DictionaryValuesWriter is used instead of the inferred 
> DeltaLongEncodingWriter. 
> The original issue is here : 
> https://github.com/apache/parquet-mr/pull/154#issuecomment-266489768



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to