Wang, Gang created KYLIN-3115:
---------------------------------

             Summary: Incompatible RowKeySplitter initialize between build and 
merge job
                 Key: KYLIN-3115
                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
             Project: Kylin
          Issue Type: Bug
            Reporter: Wang, Gang


In class NDCuboidBuilder. 
_public NDCuboidBuilder(CubeSegment cubeSegment) {
    this.cubeSegment = cubeSegment;
    this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
    this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}_
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
_rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_

So, if a dimension is encoded in fixed length and the length is 256. The cube 
building job will succeed. While, the merge job will always fail.
    public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
       _ long cuboidID = rowKeySplitter.split(key.getBytes());_
        Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
_        // rowkey columns
        for (int i = 0; i < cuboid.getColumns().size(); i++) {
            splitOffsets[i] = offset;
            TblColRef col = cuboid.getColumns().get(i);
            int colLength = colIO.getColumnLength(col);
            SplittedBytes split = this.splitBuffers[this.bufferSize++];
            split.length = colLength;
           _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
            offset += colLength;
        }_
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to