[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

Shaofeng SHI (JIRA) Fri, 04 May 2018 01:11:12 -0700

     [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shaofeng SHI updated KYLIN-3115:
--------------------------------
    Fix Version/s: v2.4.0

> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3115
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>            Reporter: Wang, Gang
>            Assignee: Wang, Gang
>            Priority: Minor
>             Fix For: v2.4.0
>
>
> In class NDCuboidBuilder:
>     public NDCuboidBuilder(CubeSegment cubeSegment) {
>         this.cubeSegment = cubeSegment;
>         this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
>         this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
>     } 
> which will create a bytes array with length 256 to fill in rowkey column 
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);
> So, if a dimension is encoded in fixed length and the max length is set to 
> 256. The cube building job will succeed. While, the merge job will always 
> fail. Since in class MergeCuboidMapper method doMap:
>     public void doMap(Text key, Text value, Context context) throws 
> IOException, InterruptedException {
>         long cuboidID = rowKeySplitter.split(key.getBytes());
>         Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
>         for (int i = 0; i < cuboid.getColumns().size(); i++) {
>             splitOffsets[i] = offset;
>             TblColRef col = cuboid.getColumns().get(i);
>             int colLength = colIO.getColumnLength(col);
>             SplittedBytes split = this.splitBuffers[this.bufferSize++];
>             split.length = colLength;
>             System.arraycopy(bytes, offset, split.value, 0, colLength);
>             offset += colLength;
>         }
> Method System.arraycopy will result in IndexOutOfBoundsException exception, 
> if a column value length is 256 in bytes and is being copied to a bytes array 
> with length 255.
> The incompatibility is also occurred in class 
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our 
> production. Since in Hive, type varchar(256) is pretty common, users do have 
> not much Kylin knowledge will prefer to chose fix length encoding on such 
> dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

Reply via email to