[
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shaofeng SHI updated KYLIN-3115:
--------------------------------
Fix Version/s: v2.4.0
> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Reporter: Wang, Gang
> Assignee: Wang, Gang
> Priority: Minor
> Fix For: v2.4.0
>
>
> In class NDCuboidBuilder:
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
> this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }
> which will create a bytes array with length 256 to fill in rowkey column
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255.
> rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);
> So, if a dimension is encoded in fixed length and the max length is set to
> 256. The cube building job will succeed. While, the merge job will always
> fail. Since in class MergeCuboidMapper method doMap:
> public void doMap(Text key, Text value, Context context) throws
> IOException, InterruptedException {
> long cuboidID = rowKeySplitter.split(key.getBytes());
> Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
> for (int i = 0; i < cuboid.getColumns().size(); i++) {
> splitOffsets[i] = offset;
> TblColRef col = cuboid.getColumns().get(i);
> int colLength = colIO.getColumnLength(col);
> SplittedBytes split = this.splitBuffers[this.bufferSize++];
> split.length = colLength;
> System.arraycopy(bytes, offset, split.value, 0, colLength);
> offset += colLength;
> }
> Method System.arraycopy will result in IndexOutOfBoundsException exception,
> if a column value length is 256 in bytes and is being copied to a bytes array
> with length 255.
> The incompatibility is also occurred in class
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as:
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our
> production. Since in Hive, type varchar(256) is pretty common, users do have
> not much Kylin knowledge will prefer to chose fix length encoding on such
> dimensions, and set max length as 256.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)