Please check how many mappers be started in the "Convert Cuboid Data to HFile" step. If the number is too much more than expected, that would be a problem of the "Build cube with spark" step.
You may need tune this parameter to a bigger value to reduce the output file numbers: kylin.engine.spark.rdd-partition-cut-mb=10 By default Spark will split the partition every 10 MB; When the Cube size is big, or there is "COUNT DISTINCT" or "TopN" measure, many partitions might be generated, that may slow down the next step. 2018-05-25 16:49 GMT+08:00 Li Yang <[email protected]>: > Currently there is no Spark version of the step "Convert Cuboid Data to > HFile". The best shot is to tune the Hadoop MR job of converting to HFile. > > I would suggest to start by checking parallel-ness of the job. See if there > is enough mappers and reducers started. If not, consider cut the cuboid > into smaller regions, which will increase the number of mappers and > reducers. > > On Tue, May 8, 2018 at 2:56 AM, narendracs <[email protected]> wrote: > > > I am using spark engine for cube processing, step to Convert Cuboid Data > to > > HFile is taking most of time. > > 14 M input record, 3 dimensions ( 1 UHC) , it took around 28 mins to > build > > the cube out of which 20 mins just for Convert Cuboid Data to HFile. I > > noticed this step is running map reduce even though I have selected spark > > as > > engine type. > > is there any way to make this step run on spark instead of MR ? > > also there any configuration which can help to optimize this step? > > > > thanks > > > > -- > > Sent from: http://apache-kylin.74782.x6.nabble.com/ > > > -- Best regards, Shaofeng Shi 史少锋
