Hi Dayue,

could you please open a JIRA for this, and make it configurable? As I know
now Kylin allow cube level's configurations to overwirte kylin.properties,
with this you can customize the magic number at cube level.

Thanks;

2016-04-25 15:01 GMT+08:00 Li Yang <[email protected]>:

> The magic coefficient is due to hbase compression on keys and values, the
> final cube size is much smaller than the sum of all keys and all values.
> That's why multiplying the coefficient. It's totally by experience at the
> moment. It should vary depends on the key encoding and compression applied
> to HTable.
>
> At the minimal, we should make it configurable I think.
>
> On Mon, Apr 18, 2016 at 4:38 PM, Dayue Gao <[email protected]> wrote:
>
> > Hi everyone,
> >
> >
> > I made several cubing tests on 1.5 and found most of the time was spent
> on
> > the "Convert Cuboid Data to HFile" step due to lack of reducer
> parallelism.
> > It seems that the estimated cube size is too small compared to the actual
> > size, which leads to small number of regions (hence reducers) to be
> > created. The setup and result of the tests are like:
> >
> >
> > Cube#1: source_record=11998051, estimated_size=8805MB, coefficient=0.25,
> > region_cut=5GB, #regions=2, actual_size=49GB
> > Cube#2: source_record=123908390, estimated_size=4653MB, coefficient=0.05,
> > region_cut=10GB, #regions=2, actual_size=144GB
> >
> >
> > The "coefficient" is from CubeStatsReader#estimateCuboidStorageSize,
> which
> > looks mysterious to me. Currently the formula for cuboid size estimation
> is
> >
> >
> >   size(cuboid) = rows(cuboid) x row_size(cuboid) x coefficient
> >   where coefficient = has_memory_hungry_measures(cube) ? 0.05 : 0.25
> >
> >
> > Why do we multiply the coefficient? And why it's five times smaller in
> > memory hungry case? Cloud someone explain the rationale behind it?
> >
> >
> > Thanks, Dayue
> >
> >
> >
> >
> >
> >
> >
> >
>



-- 
Best regards,

Shaofeng Shi

Reply via email to