Re: Max dimension value length cause job error

yu feng Thu, 29 Oct 2015 03:32:19 -0700

I do not actually know what kind data in input tables, I encounter this
problem while I build cube for other product and it really block me(until I
modify source code) , I think we can take those rows as invalid data, then
continue buliding cube.


2015-10-29 18:13 GMT+08:00 Li Yang <[email protected]>:

> Em... dimension length goes beyond 4096.. that sounds not good.  Anyway
> this is bug, I'm sure of it.
>
> But do you really have a valid case that requires more than 4K to hold
> dimensions? That means the hbase rowkey will be more than 4K!
>
> On Tue, Oct 27, 2015 at 1:10 PM, yu feng <[email protected]> wrote:
>
> > I create a jira ticket here :
> > https://issues.apache.org/jira/browse/KYLIN-1104
> >
> > 2015-10-27 11:50 GMT+08:00 yu feng <[email protected]>:
> >
> > > Hi all, I get error in step "Build Base Cuboid Data" when I build a new
> > > cube, After modify source code and check the error log I find
> > > those stacktrace:
> > > java.lang.ArrayIndexOutOfBoundsException
> > > at
> > org.apache.kylin.common.util.BytesSplitter.split(BytesSplitter.java:68)
> > > at
> > >
> >
> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:212)
> > > at
> > >
> >
> org.apache.kylin.job.hadoop.cube.BaseCuboidMapper.map(BaseCuboidMapper.java:55)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > > at java.security.AccessController.doPrivileged(Native Method)
> > > at javax.security.auth.Subject.doAs(Subject.java:415)
> > > at
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> > > split 0, value length 4096, real length 4876
> > >
> > > the last line is debug info added by myself, and I exchange those two
> > line
> > > :                ex.printStackTrace(System.err);
> > >  System.err.println("Insane record: " + bytesSplitter);
> > >
> > > in BaseCuboidMapper.handleErrorRecord function,
> > >
> > > With those infomations I find the original reason is this job
> > > create bytesSplitter = new BytesSplitter(200, 4096); in setup, Once the
> > > length of my dimension value is bigger than 4096,
> > >  ArrayIndexOutOfBoundsException will be throwed in BytesSplitter.split,
> > and
> > > in mapper function this exception will be catched(I guess maybe kylin
> > take
> > > this row as an incorrect row or do not think about this situation),
> then
> > > call handleErrorRecord. However, in this function, it will print the
> > splits
> > > infos like this :
> > > System.err.println("Insane record: " + bytesSplitter);
> > >
> > > which will call bytesSplitter.toString() :
> > > public String toString() {
> > >         StringBuilder buf = new StringBuilder();
> > >         buf.append("[");
> > >         for (int i = 0; i < bufferSize; i++) {
> > >             if (i > 0)
> > >                 buf.append(", ");
> > >
> > >             buf.append(Bytes.toString(splitBuffers[i].value, 0,
> > > splitBuffers[i].length));
> > >         }
> > >         return buf.toString();
> > >     }
> > >
> > > this function will convert bytes to string and add to a StringBuffer
> > > object, But in the conversion, the input is splitBuffers[i], which
> length
> > > is my column value length(in my example is 4876, and the length is
> setted
> > > before copy data in BytesSplitter.split ), and the array was just
> > allocated
> > > 4096 bytes, That will casue another ArrayIndexOutOfBoundsException and
> > make
> > > the job failed.
> > >
> > > I think the 4096 is the max dimension value length, Is it necessary to
> > > make is a config property, and we should catch the
> > > ArrayIndexOutOfBoundsException, Otherwise, I can not go on with my cube
> > > building.
> > >
> > >
> >
>

Re: Max dimension value length cause job error

Reply via email to