[jira] [Updated] (KYLIN-2243) TopN memory estimation is inaccurate in some cases

Shaofeng SHI (JIRA) Tue, 14 Feb 2017 01:56:51 -0800

     [ 
https://issues.apache.org/jira/browse/KYLIN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shaofeng SHI updated KYLIN-2243:
--------------------------------
    Description: 
TopNCounterSerializer.maxLength() and 
TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, especially 
when there are multiple "group by" columns in one TopN measure and some uses 
long bytes encoding like "fixed_length:16"

The inaccurate estimation may cause memory issue when using in-mem cubing, and 
will cause the estimation on final cube size inaccurate.

The root cause is the data type like "top(100)" doesn't have the info of how 
long a key can be. So far it uses a default value 4 which is too small when the 
encoding is something like "fixed_length:16". The solution is extending the 
expression of data type to "top(100, 16)" to indicate that one key can be 16 
bytes long. If the "scale" is absent, use 4 bytes as default.


  was:
TopNCounterSerializer.maxLength() and 
TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, especially 
when there are multiple "group by" columns in one TopN measure and some uses 
long bytes encoding like "fixed_length:16"

The inaccurate estimation may cause memory issue when using in-mem cubing, and 
will cause the estimation on final cube size inaccurate.

The root cause is the data type like "top(100)" doesn't have the info of how 
long a key can be. So far it uses a default value 4 which is too small when the 
encoding is something like "fixed_length:16". The solution is extending the 
expression of data type to "top(100, 16)" to indicate that one key can be 16 
bytes long. If the "scale" is absent, use 6 bytes as default.



> TopN memory estimation is inaccurate in some cases
> --------------------------------------------------
>
>                 Key: KYLIN-2243
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2243
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Shaofeng SHI
>            Assignee: Shaofeng SHI
>             Fix For: v2.0.0
>
>
> TopNCounterSerializer.maxLength() and 
> TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, 
> especially when there are multiple "group by" columns in one TopN measure and 
> some uses long bytes encoding like "fixed_length:16"
> The inaccurate estimation may cause memory issue when using in-mem cubing, 
> and will cause the estimation on final cube size inaccurate.
> The root cause is the data type like "top(100)" doesn't have the info of how 
> long a key can be. So far it uses a default value 4 which is too small when 
> the encoding is something like "fixed_length:16". The solution is extending 
> the expression of data type to "top(100, 16)" to indicate that one key can be 
> 16 bytes long. If the "scale" is absent, use 4 bytes as default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (KYLIN-2243) TopN memory estimation is inaccurate in some cases

Reply via email to