[
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wang, Gang updated KYLIN-3115:
------------------------------
Priority: Minor (was: Major)
> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
> Issue Type: Bug
> Reporter: Wang, Gang
> Assignee: Wang, Gang
> Priority: Minor
>
> In class NDCuboidBuilder.
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
> this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }_
> which will create a temp bytes array with length 256 to fill in rowkey column
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255.
> _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
> So, if a dimension is encoded in fixed length and the length is 256. The cube
> building job will succeed. While, the merge job will always fail.
> public void doMap(Text key, Text value, Context context) throws
> IOException, InterruptedException {
> _ long cuboidID = rowKeySplitter.split(key.getBytes());_
> Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
> _ // rowkey columns
> for (int i = 0; i < cuboid.getColumns().size(); i++) {
> splitOffsets[i] = offset;
> TblColRef col = cuboid.getColumns().get(i);
> int colLength = colIO.getColumnLength(col);
> SplittedBytes split = this.splitBuffers[this.bufferSize++];
> split.length = colLength;
> _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
> offset += colLength;
> }_
> Method System.arraycopy will result in IndexOutOfBoundsException exception,
> if a column length is 256 in bytes and is being copied to a bytes array with
> length 255.
> The incompatibility is also occurred in class
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as:
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our
> production. Since in Hive, type varchar(256) is pretty common, users does
> have not much knowledge will prefer to chose fix length encoding on such
> dimensions, and set max length as 256.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)