[
https://issues.apache.org/jira/browse/KYLIN-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiaoxiang Yu updated KYLIN-4941:
--------------------------------
Fix Version/s: (was: v3.1.2)
v3.2.0
> Support encoding raw data to base cuboid column-by-column
> ---------------------------------------------------------
>
> Key: KYLIN-4941
> URL: https://issues.apache.org/jira/browse/KYLIN-4941
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Affects Versions: v3.1.1
> Reporter: ShengJun Zheng
> Priority: Major
> Fix For: v3.2.0
>
>
> When building with spark engine, the first step is to encode hive table's row
> to base cuboid data.
> The existing implementation is encoding row by row. If the cube has several
> dictionary encoded measures, it has to use all dictionaries at the same time
> to encode a single row. This causes heavy memory usage, and low cache hit
> ratio of dictionary cache.
> We optimized this case by encoding column by column, and it did bring
> significant improvement over cubes with several high cardinality
> dictionaries-encoded measures.
> We will refine the implementation based on KYLIN3.x and share it out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)