Max dimension value length cause job error

yu feng Mon, 26 Oct 2015 20:51:07 -0700

Hi all, I get error in step "Build Base Cuboid Data" when I build a new
cube, After modify source code and check the error log I find
those stacktrace:
java.lang.ArrayIndexOutOfBoundsException
at org.apache.kylin.common.util.BytesSplitter.split(BytesSplitter.java:68)
at
org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:212)
at
org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:55)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
split 0, value length 4096, real length 4876


the last line is debug info added by myself, and I exchange those two line
:                ex.printStackTrace(System.err);
 System.err.println("Insane record: " + bytesSplitter);

in BaseCuboidMapper.handleErrorRecord function,

With those infomations I find the original reason is this job
create bytesSplitter = new BytesSplitter(200, 4096); in setup, Once the
length of my dimension value is bigger than 4096,
 ArrayIndexOutOfBoundsException will be throwed in BytesSplitter.split, and
in mapper function this exception will be catched(I guess maybe kylin take
this row as an incorrect row or do not think about this situation), then
call handleErrorRecord. However, in this function, it will print the splits
infos like this :
System.err.println("Insane record: " + bytesSplitter);

which will call bytesSplitter.toString() :
public String toString() {
        StringBuilder buf = new StringBuilder();
        buf.append("[");
        for (int i = 0; i < bufferSize; i++) {
            if (i > 0)
                buf.append(", ");

            buf.append(Bytes.toString(splitBuffers[i].value, 0,
splitBuffers[i].length));
        }
        return buf.toString();
    }

this function will convert bytes to string and add to a StringBuffer
object, But in the conversion, the input is splitBuffers[i], which length
is my column value length(in my example is 4876, and the length is setted
before copy data in BytesSplitter.split ), and the array was just allocated
4096 bytes, That will casue another ArrayIndexOutOfBoundsException and make
the job failed.

I think the 4096 is the max dimension value length, Is it necessary to make
is a config property, and we should catch the
ArrayIndexOutOfBoundsException, Otherwise, I can not go on with my cube
building.

Max dimension value length cause job error

Reply via email to