[1/2] kylin git commit: Add dynamic resource allotion in spark cubing

shaofengshi Fri, 21 Jul 2017 18:35:47 -0700

Repository: kylin
Updated Branches:
  refs/heads/document 4c1c736eb -> f17075c13



Add dynamic resource allotion in spark cubing


Project: http://git-wip-us.apache.org/repos/asf/kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/90b4e3bd
Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/90b4e3bd
Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/90b4e3bd

Branch: refs/heads/document
Commit: 90b4e3bd313dceeba0cb49284f38b6acdb2a3296
Parents: 4c1c736
Author: shaofengshi <[email protected]>
Authored: Sat Jul 22 09:35:07 2017 +0800
Committer: shaofengshi <[email protected]>
Committed: Sat Jul 22 09:35:07 2017 +0800

----------------------------------------------------------------------
 website/_docs20/tutorial/cube_spark.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kylin/blob/90b4e3bd/website/_docs20/tutorial/cube_spark.md
----------------------------------------------------------------------
diff --git a/website/_docs20/tutorial/cube_spark.md 
b/website/_docs20/tutorial/cube_spark.md
index 8ec8f04..d5c12e3 100644
--- a/website/_docs20/tutorial/cube_spark.md
+++ b/website/_docs20/tutorial/cube_spark.md
@@ -162,6 +162,6 @@ Click a specific job, there you will see the detail runtime 
information, that is
 
 ## Go further
 
-If you're a Kylin administrator but new to Spark, suggest you go through 
[Spark documents](https://spark.apache.org/docs/1.6.3/), and don't forget to 
update the configurations accordingly. Spark's performance relies on Cluster's 
memory and CPU resource, while Kylin's Cube build is a heavy task when having a 
complex data model and a huge dataset to build at one time. If your cluster 
resource couldn't fulfill, errors like "OutOfMemorry" will be thrown in Spark 
executors, so please use it properly. For Cube which has UHC dimension, many 
combinations (e.g, a full cube with more than 12 dimensions), or memory hungry 
measures (Count Distinct, Top-N), suggest to use the MapReduce engine. If your 
Cube model is simple, all measures are SUM/MIN/MAX/COUNT, source data is small 
to medium scale, Spark engine would be a good choice. Besides, Streaming build 
isn't supported in this engine so far (KYLIN-2484).
+If you're a Kylin administrator but new to Spark, suggest you go through 
[Spark documents](https://spark.apache.org/docs/1.6.3/), and don't forget to 
update the configurations accordingly. You can enable Spark [Dynamic Resource 
Allocation](https://spark.apache.org/docs/1.6.1/configuration.html#dynamic-allocation)
 so that it can auto scale/shrink for different work load. Spark's performance 
relies on Cluster's memory and CPU resource, while Kylin's Cube build is a 
heavy task when having a complex data model and a huge dataset to build at one 
time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" 
will be thrown in Spark executors, so please use it properly. For Cube which 
has UHC dimension, many combinations (e.g, a full cube with more than 12 
dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use 
the MapReduce engine. If your Cube model is simple, all measures are 
SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark engine would 
 be a good choice. Besides, Streaming build isn't supported in this engine so 
far (KYLIN-2484).
 
 Now the Spark engine is in public beta; If you have any question, comment, or 
bug fix, welcome to discuss in [email protected].

[1/2] kylin git commit: Add dynamic resource allotion in spark cubing

Reply via email to