Ruslan Dautkhanov created PARQUET-966:
-----------------------------------------

             Summary: Store `dictionary entries` of parquet columns that will 
be used for joins
                 Key: PARQUET-966
                 URL: https://issues.apache.org/jira/browse/PARQUET-966
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-format
    Affects Versions: format-2.3.1, 1.8.0
            Reporter: Ruslan Dautkhanov


It would be great if Parquet would store `dictionary entries` for columns 
marked to be used for joins. 

When a column is used for a join (it could be a [surrogate 
key|https://en.wikipedia.org/wiki/Surrogate_key] or a [natural 
key|https://en.wikipedia.org/wiki/Natural_key]) - the value of a cloumn used 
for join itself is actually not so important. 

So we could join directly on `dictionary entries` instead of values 
and save CPU cycles. (no need to decompress etc)

Inspired by [Oracle In-memory columnar storage improvements in 
12.2|https://blogs.oracle.com/In-Memory/entry/what_s_new_in_12]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to