Shaofeng SHI commented on KYLIN-3296:

Hello, I merged but soon I reverted this change.


The concern is for performance: the method split() of 

RowKeySplitter will be called many many times. And in this method, with this 
patch, for each column it will do an additional check for the buffer size. But 
only in very small occasions, it will have that case.


Besides, as I mentioned previously, "using fixed_length 500 as the encoding is 
terrable", that is not recommended for Kylin. And for HBase, it recommend to 
keep the rowkey as short as possible. Currently Kylin allocate 255 is already 
very enough.


There are alternative ways:

1) add restriction on rowkey length

2) when new the "RowKeySplitter" object, passing in the concret max. length of 
the rowkey for this segment. Then there will be no such repeating check.

> When merge cube,get java.lang.ArrayIndexOutOfBoundsException at 
> java.lang.System.arraycopy(Native Method)
> ---------------------------------------------------------------------------------------------------------
>                 Key: KYLIN-3296
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3296
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.3.0
>            Reporter: RenZhiMin
>            Assignee: RenZhiMin
>            Priority: Major
>              Labels: patch
>         Attachments: JIRA.master.3296.patch
> cube中,设计rowkey时,有个维度设置编码方式是固定长度500。每天采用内存构建算法。在合并cube时,在生成的mr中的map任务执行中出现“java.lang.ArrayIndexOutOfBoundsException
>  at java.lang.System.arraycopy(Native Method)” 
> 错误。经查看在生成的mr中的map任务中需要对要合并的cuboiddata数据的rowkey进行切分,切分时,根据每个维度的编码方式获取对应的长度,然后从rowkey中获取,并赋值给SplittedBytes的value中,由于value数组初始化时设置的固定值255,所以在切分大于255的维度值时,出现下标越界错误。

This message was sent by Atlassian JIRA

Reply via email to