[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507695#comment-16507695 ]
ASF subversion and git services commented on KYLIN-3115: -------------------------------------------------------- Commit f6b1dfb5ef3239ea252b1498bf4c51235361bbcd in kylin's branch refs/heads/master from shaofengshi [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f6b1dfb ] KYLIN-3115 Incompatible RowKeySplitter initialize between build and merge job > Incompatible RowKeySplitter initialize between build and merge job > ------------------------------------------------------------------ > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Reporter: Wang, Gang > Assignee: Shaofeng SHI > Priority: Minor > Fix For: v2.4.0 > > > In class NDCuboidBuilder: > public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); > this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); > } > which will create a bytes array with length 256 to fill in rowkey column > bytes. > While, in class MergeCuboidMapper it's initialized with length 255. > rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); > So, if a dimension is encoded in fixed length and the max length is set to > 256. The cube building job will succeed. While, the merge job will always > fail. Since in class MergeCuboidMapper method doMap: > public void doMap(Text key, Text value, Context context) throws > IOException, InterruptedException { > long cuboidID = rowKeySplitter.split(key.getBytes()); > Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); > in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): > for (int i = 0; i < cuboid.getColumns().size(); i++) { > splitOffsets[i] = offset; > TblColRef col = cuboid.getColumns().get(i); > int colLength = colIO.getColumnLength(col); > SplittedBytes split = this.splitBuffers[this.bufferSize++]; > split.length = colLength; > System.arraycopy(bytes, offset, split.value, 0, colLength); > offset += colLength; > } > Method System.arraycopy will result in IndexOutOfBoundsException exception, > if a column value length is 256 in bytes and is being copied to a bytes array > with length 255. > The incompatibility is also occurred in class > FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: > rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); > I think the better way is to always set the max split length as 256. > And actually dimension encoded in fix length 256 is pretty common in our > production. Since in Hive, type varchar(256) is pretty common, users do have > not much Kylin knowledge will prefer to chose fix length encoding on such > dimensions, and set max length as 256. -- This message was sent by Atlassian JIRA (v7.6.3#76005)