Em... dimension length goes beyond 4096.. that sounds not good. Anyway this is bug, I'm sure of it.
But do you really have a valid case that requires more than 4K to hold dimensions? That means the hbase rowkey will be more than 4K! On Tue, Oct 27, 2015 at 1:10 PM, yu feng <[email protected]> wrote: > I create a jira ticket here : > https://issues.apache.org/jira/browse/KYLIN-1104 > > 2015-10-27 11:50 GMT+08:00 yu feng <[email protected]>: > > > Hi all, I get error in step "Build Base Cuboid Data" when I build a new > > cube, After modify source code and check the error log I find > > those stacktrace: > > java.lang.ArrayIndexOutOfBoundsException > > at > org.apache.kylin.common.util.BytesSplitter.split(BytesSplitter.java:68) > > at > > > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:212) > > at > > > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:55) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > split 0, value length 4096, real length 4876 > > > > the last line is debug info added by myself, and I exchange those two > line > > : ex.printStackTrace(System.err); > > System.err.println("Insane record: " + bytesSplitter); > > > > in BaseCuboidMapper.handleErrorRecord function, > > > > With those infomations I find the original reason is this job > > create bytesSplitter = new BytesSplitter(200, 4096); in setup, Once the > > length of my dimension value is bigger than 4096, > > ArrayIndexOutOfBoundsException will be throwed in BytesSplitter.split, > and > > in mapper function this exception will be catched(I guess maybe kylin > take > > this row as an incorrect row or do not think about this situation), then > > call handleErrorRecord. However, in this function, it will print the > splits > > infos like this : > > System.err.println("Insane record: " + bytesSplitter); > > > > which will call bytesSplitter.toString() : > > public String toString() { > > StringBuilder buf = new StringBuilder(); > > buf.append("["); > > for (int i = 0; i < bufferSize; i++) { > > if (i > 0) > > buf.append(", "); > > > > buf.append(Bytes.toString(splitBuffers[i].value, 0, > > splitBuffers[i].length)); > > } > > return buf.toString(); > > } > > > > this function will convert bytes to string and add to a StringBuffer > > object, But in the conversion, the input is splitBuffers[i], which length > > is my column value length(in my example is 4876, and the length is setted > > before copy data in BytesSplitter.split ), and the array was just > allocated > > 4096 bytes, That will casue another ArrayIndexOutOfBoundsException and > make > > the job failed. > > > > I think the 4096 is the max dimension value length, Is it necessary to > > make is a config property, and we should catch the > > ArrayIndexOutOfBoundsException, Otherwise, I can not go on with my cube > > building. > > > > >
