I do not actually know what kind data in input tables, I encounter this problem while I build cube for other product and it really block me(until I modify source code) , I think we can take those rows as invalid data, then continue buliding cube.
2015-10-29 18:13 GMT+08:00 Li Yang <[email protected]>: > Em... dimension length goes beyond 4096.. that sounds not good. Anyway > this is bug, I'm sure of it. > > But do you really have a valid case that requires more than 4K to hold > dimensions? That means the hbase rowkey will be more than 4K! > > On Tue, Oct 27, 2015 at 1:10 PM, yu feng <[email protected]> wrote: > > > I create a jira ticket here : > > https://issues.apache.org/jira/browse/KYLIN-1104 > > > > 2015-10-27 11:50 GMT+08:00 yu feng <[email protected]>: > > > > > Hi all, I get error in step "Build Base Cuboid Data" when I build a new > > > cube, After modify source code and check the error log I find > > > those stacktrace: > > > java.lang.ArrayIndexOutOfBoundsException > > > at > > org.apache.kylin.common.util.BytesSplitter.split(BytesSplitter.java:68) > > > at > > > > > > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:212) > > > at > > > > > > org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:55) > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > > > at java.security.AccessController.doPrivileged(Native Method) > > > at javax.security.auth.Subject.doAs(Subject.java:415) > > > at > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > > split 0, value length 4096, real length 4876 > > > > > > the last line is debug info added by myself, and I exchange those two > > line > > > : ex.printStackTrace(System.err); > > > System.err.println("Insane record: " + bytesSplitter); > > > > > > in BaseCuboidMapper.handleErrorRecord function, > > > > > > With those infomations I find the original reason is this job > > > create bytesSplitter = new BytesSplitter(200, 4096); in setup, Once the > > > length of my dimension value is bigger than 4096, > > > ArrayIndexOutOfBoundsException will be throwed in BytesSplitter.split, > > and > > > in mapper function this exception will be catched(I guess maybe kylin > > take > > > this row as an incorrect row or do not think about this situation), > then > > > call handleErrorRecord. However, in this function, it will print the > > splits > > > infos like this : > > > System.err.println("Insane record: " + bytesSplitter); > > > > > > which will call bytesSplitter.toString() : > > > public String toString() { > > > StringBuilder buf = new StringBuilder(); > > > buf.append("["); > > > for (int i = 0; i < bufferSize; i++) { > > > if (i > 0) > > > buf.append(", "); > > > > > > buf.append(Bytes.toString(splitBuffers[i].value, 0, > > > splitBuffers[i].length)); > > > } > > > return buf.toString(); > > > } > > > > > > this function will convert bytes to string and add to a StringBuffer > > > object, But in the conversion, the input is splitBuffers[i], which > length > > > is my column value length(in my example is 4876, and the length is > setted > > > before copy data in BytesSplitter.split ), and the array was just > > allocated > > > 4096 bytes, That will casue another ArrayIndexOutOfBoundsException and > > make > > > the job failed. > > > > > > I think the 4096 is the max dimension value length, Is it necessary to > > > make is a config property, and we should catch the > > > ArrayIndexOutOfBoundsException, Otherwise, I can not go on with my cube > > > building. > > > > > > > > >
