Hi, you can observe the Spark generated cuboid file size in kylin's working dir (for example: /kylin/kylin_metadata/kylin-fd785bab-b875-4626-8bc3-7d46e8862d88/kylin_sales_cube/cuboid/level_base_cuboid/, please replace the uuid and cube name with yours); If there are small files (e.g several Mbs), you should increase this configuration to make the partition bigger (e.g, 64 MB); Usually, this is needed when your cube has some advanced measures like count distinct, topn, percentile etc, whose size estimation is a little wild.
The situation got improved in v2.5.0, as we enhanced the size estimation for those measures. With 2.5 you don't need to care much about it I think. vishnuvardhanG <[email protected]> 于2018年9月28日周五 下午6:41写道: > http://kylin.apache.org/docs20/tutorial/cube_spark.html > > In the above link there is mentioning about the affect of > "kylin.engine.spark.rdd-partition-cut-mb" on cube building performance. > > how to decide the optimum value of > "kylin.engine.spark.rdd-partition-cut-mb" for cube creation? > > -- Best regards, Shaofeng Shi 史少锋
