[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

Wang, Gang (JIRA) Sun, 17 Dec 2017 23:34:03 -0800

     [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wang, Gang updated KYLIN-3115:
------------------------------
    Priority: Minor  (was: Major)

> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3115
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Wang, Gang
>            Assignee: Wang, Gang
>            Priority: Minor
>
> In class NDCuboidBuilder. 
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
>     this.cubeSegment = cubeSegment;
>     this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
>     this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }_
> which will create a temp bytes array with length 256 to fill in rowkey column 
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
> So, if a dimension is encoded in fixed length and the length is 256. The cube 
> building job will succeed. While, the merge job will always fail.
>     public void doMap(Text key, Text value, Context context) throws 
> IOException, InterruptedException {
>        _ long cuboidID = rowKeySplitter.split(key.getBytes());_
>         Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
> _        // rowkey columns
>         for (int i = 0; i < cuboid.getColumns().size(); i++) {
>             splitOffsets[i] = offset;
>             TblColRef col = cuboid.getColumns().get(i);
>             int colLength = colIO.getColumnLength(col);
>             SplittedBytes split = this.splitBuffers[this.bufferSize++];
>             split.length = colLength;
>            _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
>             offset += colLength;
>         }_
> Method System.arraycopy will result in IndexOutOfBoundsException exception, 
> if a column length is 256 in bytes and is being copied to a bytes array with 
> length 255.
> The incompatibility is also occurred in class 
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our 
> production. Since in Hive, type varchar(256) is pretty common, users does 
> have not much knowledge will prefer to chose fix length encoding on such 
> dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

Reply via email to