Ruslan Dautkhanov created PARQUET-966:
-----------------------------------------
Summary: Store `dictionary entries` of parquet columns that will
be used for joins
Key: PARQUET-966
URL: https://issues.apache.org/jira/browse/PARQUET-966
Project: Parquet
Issue Type: Improvement
Components: parquet-format
Affects Versions: format-2.3.1, 1.8.0
Reporter: Ruslan Dautkhanov
It would be great if Parquet would store `dictionary entries` for columns
marked to be used for joins.
When a column is used for a join (it could be a [surrogate
key|https://en.wikipedia.org/wiki/Surrogate_key] or a [natural
key|https://en.wikipedia.org/wiki/Natural_key]) - the value of a cloumn used
for join itself is actually not so important.
So we could join directly on `dictionary entries` instead of values
and save CPU cycles. (no need to decompress etc)
Inspired by [Oracle In-memory columnar storage improvements in
12.2|https://blogs.oracle.com/In-Memory/entry/what_s_new_in_12]
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)