Repository: kylin Updated Branches: refs/heads/document 4c1c736eb -> f17075c13
Add dynamic resource allotion in spark cubing Project: http://git-wip-us.apache.org/repos/asf/kylin/repo Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/90b4e3bd Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/90b4e3bd Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/90b4e3bd Branch: refs/heads/document Commit: 90b4e3bd313dceeba0cb49284f38b6acdb2a3296 Parents: 4c1c736 Author: shaofengshi <[email protected]> Authored: Sat Jul 22 09:35:07 2017 +0800 Committer: shaofengshi <[email protected]> Committed: Sat Jul 22 09:35:07 2017 +0800 ---------------------------------------------------------------------- website/_docs20/tutorial/cube_spark.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kylin/blob/90b4e3bd/website/_docs20/tutorial/cube_spark.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/cube_spark.md b/website/_docs20/tutorial/cube_spark.md index 8ec8f04..d5c12e3 100644 --- a/website/_docs20/tutorial/cube_spark.md +++ b/website/_docs20/tutorial/cube_spark.md @@ -162,6 +162,6 @@ Click a specific job, there you will see the detail runtime information, that is ## Go further -If you're a Kylin administrator but new to Spark, suggest you go through [Spark documents](https://spark.apache.org/docs/1.6.3/), and don't forget to update the configurations accordingly. Spark's performance relies on Cluster's memory and CPU resource, while Kylin's Cube build is a heavy task when having a complex data model and a huge dataset to build at one time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" will be thrown in Spark executors, so please use it properly. For Cube which has UHC dimension, many combinations (e.g, a full cube with more than 12 dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use the MapReduce engine. If your Cube model is simple, all measures are SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark engine would be a good choice. Besides, Streaming build isn't supported in this engine so far (KYLIN-2484). +If you're a Kylin administrator but new to Spark, suggest you go through [Spark documents](https://spark.apache.org/docs/1.6.3/), and don't forget to update the configurations accordingly. You can enable Spark [Dynamic Resource Allocation](https://spark.apache.org/docs/1.6.1/configuration.html#dynamic-allocation) so that it can auto scale/shrink for different work load. Spark's performance relies on Cluster's memory and CPU resource, while Kylin's Cube build is a heavy task when having a complex data model and a huge dataset to build at one time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" will be thrown in Spark executors, so please use it properly. For Cube which has UHC dimension, many combinations (e.g, a full cube with more than 12 dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use the MapReduce engine. If your Cube model is simple, all measures are SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark engine would be a good choice. Besides, Streaming build isn't supported in this engine so far (KYLIN-2484). Now the Spark engine is in public beta; If you have any question, comment, or bug fix, welcome to discuss in [email protected].
