Em... dimension length goes beyond 4096.. that sounds not good.  Anyway
this is bug, I'm sure of it.

But do you really have a valid case that requires more than 4K to hold
dimensions? That means the hbase rowkey will be more than 4K!

On Tue, Oct 27, 2015 at 1:10 PM, yu feng <[email protected]> wrote:

> I create a jira ticket here :
> https://issues.apache.org/jira/browse/KYLIN-1104
>
> 2015-10-27 11:50 GMT+08:00 yu feng <[email protected]>:
>
> > Hi all, I get error in step "Build Base Cuboid Data" when I build a new
> > cube, After modify source code and check the error log I find
> > those stacktrace:
> > java.lang.ArrayIndexOutOfBoundsException
> > at
> org.apache.kylin.common.util.BytesSplitter.split(BytesSplitter.java:68)
> > at
> >
> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:212)
> > at
> >
> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:55)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> > split 0, value length 4096, real length 4876
> >
> > the last line is debug info added by myself, and I exchange those two
> line
> > :                ex.printStackTrace(System.err);
> >  System.err.println("Insane record: " + bytesSplitter);
> >
> > in BaseCuboidMapper.handleErrorRecord function,
> >
> > With those infomations I find the original reason is this job
> > create bytesSplitter = new BytesSplitter(200, 4096); in setup, Once the
> > length of my dimension value is bigger than 4096,
> >  ArrayIndexOutOfBoundsException will be throwed in BytesSplitter.split,
> and
> > in mapper function this exception will be catched(I guess maybe kylin
> take
> > this row as an incorrect row or do not think about this situation), then
> > call handleErrorRecord. However, in this function, it will print the
> splits
> > infos like this :
> > System.err.println("Insane record: " + bytesSplitter);
> >
> > which will call bytesSplitter.toString() :
> > public String toString() {
> >         StringBuilder buf = new StringBuilder();
> >         buf.append("[");
> >         for (int i = 0; i < bufferSize; i++) {
> >             if (i > 0)
> >                 buf.append(", ");
> >
> >             buf.append(Bytes.toString(splitBuffers[i].value, 0,
> > splitBuffers[i].length));
> >         }
> >         return buf.toString();
> >     }
> >
> > this function will convert bytes to string and add to a StringBuffer
> > object, But in the conversion, the input is splitBuffers[i], which length
> > is my column value length(in my example is 4876, and the length is setted
> > before copy data in BytesSplitter.split ), and the array was just
> allocated
> > 4096 bytes, That will casue another ArrayIndexOutOfBoundsException and
> make
> > the job failed.
> >
> > I think the 4096 is the max dimension value length, Is it necessary to
> > make is a config property, and we should catch the
> > ArrayIndexOutOfBoundsException, Otherwise, I can not go on with my cube
> > building.
> >
> >
>

Reply via email to