Please check how many mappers be started in the "Convert Cuboid Data to
HFile" step. If the number is too much more than expected, that would be a
problem of the "Build cube with spark" step.

You may need tune this parameter to a bigger value to reduce the output
file numbers:

kylin.engine.spark.rdd-partition-cut-mb=10

By default Spark will split the partition every 10 MB; When the Cube
size is big, or there is "COUNT DISTINCT" or "TopN" measure, many
partitions might be generated, that may slow down the next step.


2018-05-25 16:49 GMT+08:00 Li Yang <[email protected]>:

> Currently there is no Spark version of the step "Convert Cuboid Data to
> HFile". The best shot is to tune the Hadoop MR job of converting to HFile.
>
> I would suggest to start by checking parallel-ness of the job. See if there
> is enough mappers and reducers started. If not, consider cut the cuboid
> into smaller regions, which will increase the number of mappers and
> reducers.
>
> On Tue, May 8, 2018 at 2:56 AM, narendracs <[email protected]> wrote:
>
> > I am using spark engine for cube processing, step to Convert Cuboid Data
> to
> > HFile is taking most of time.
> > 14 M input record,  3 dimensions ( 1 UHC) , it took around 28 mins to
> build
> > the cube out of which 20 mins just for Convert Cuboid Data to HFile. I
> > noticed this step is running map reduce even though I have selected spark
> > as
> > engine type.
> > is there any way to make this step run on spark instead of MR ?
> > also there any configuration which can help to optimize this step?
> >
> > thanks
> >
> > --
> > Sent from: http://apache-kylin.74782.x6.nabble.com/
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to