[
https://issues.apache.org/jira/browse/KYLIN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shaofeng SHI updated KYLIN-2243:
--------------------------------
Description:
TopNCounterSerializer.maxLength() and
TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, especially
when there are multiple "group by" columns in one TopN measure and some uses
long bytes encoding like "fixed_length:16"
The inaccurate estimation may cause memory issue when using in-mem cubing, and
will cause the estimation on final cube size inaccurate.
The root cause is the data type like "top(100)" doesn't have the info of how
long a key can be. So far it uses a default value 4 which is too small when the
encoding is something like "fixed_length:16". The solution is extending the
expression of data type to "top(100, 16)" to indicate that one key can be 16
bytes long. If the "scale" is absent, use 8 bytes as default.
was:
TopNCounterSerializer.maxLength() and
TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, especially
when there are multiple "group by" columns in one TopN measure and some uses
long bytes encoding like "fixed_length:16"
The inaccurate estimation may cause memory issue when using in-mem cubing, and
will cause the estimation on final cube size inaccurate.
> TopN memory estimation is inaccurate in some cases
> --------------------------------------------------
>
> Key: KYLIN-2243
> URL: https://issues.apache.org/jira/browse/KYLIN-2243
> Project: Kylin
> Issue Type: Bug
> Reporter: Shaofeng SHI
> Fix For: Backlog
>
>
> TopNCounterSerializer.maxLength() and
> TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate,
> especially when there are multiple "group by" columns in one TopN measure and
> some uses long bytes encoding like "fixed_length:16"
> The inaccurate estimation may cause memory issue when using in-mem cubing,
> and will cause the estimation on final cube size inaccurate.
> The root cause is the data type like "top(100)" doesn't have the info of how
> long a key can be. So far it uses a default value 4 which is too small when
> the encoding is something like "fixed_length:16". The solution is extending
> the expression of data type to "top(100, 16)" to indicate that one key can be
> 16 bytes long. If the "scale" is absent, use 8 bytes as default.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)