[kylin] 02/02: Update cube_spark document with KYLIN-3607

shaofengshi Wed, 09 Jan 2019 16:52:48 -0800

This is an automated email from the ASF dual-hosted git repository.

shaofengshi pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


commit 8087b4b691257b5e859ddf80d1987943fa354f9b
Author: shaofengshi <shaofeng...@apache.org>
AuthorDate: Thu Jan 10 08:52:15 2019 +0800

    Update cube_spark document with KYLIN-3607
---
 website/_docs/tutorial/cube_spark.cn.md | 18 ++++++++++++++++++
 website/_docs/tutorial/cube_spark.md    | 20 +++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/website/_docs/tutorial/cube_spark.cn.md 
b/website/_docs/tutorial/cube_spark.cn.md
index 913be9f..0bc7dee 100644
--- a/website/_docs/tutorial/cube_spark.cn.md
+++ b/website/_docs/tutorial/cube_spark.cn.md
@@ -158,6 +158,24 @@ $KYLIN_HOME/spark/sbin/start-history-server.sh 
hdfs://sandbox.hortonworks.com:80
 
 点击一个具体的 job，运行时的具体信息将会展示，该信息对疑难解答和性能调整有极大的帮助。
 
+在某些 Hadoop 版本上, 在 "Convert Cuboid Data to HFile" 这一步可能会遇到下面这个错误:
+
+{% highlight Groff markup %}
+Caused by: java.lang.RuntimeException: Could not create  interface 
org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the 
hadoop compatibility jar on the classpath?
+       at 
org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:73)
+       at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:31)
+       at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:192)
+       ... 15 more
+Caused by: java.util.NoSuchElementException
+       at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365)
+       at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
+       at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
+       at 
org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:59)
+       ... 17 more
+{% endhighlight %}
+
+解决办法是: 将 `hbase-hadoop2-compat-*.jar` 和 `hbase-hadoop-compat-*.jar` 拷贝到 
`$KYLIN_HOME/spark/jars` 目录下 (这两个 jar 文件可以从 HBase 的 lib 目录找到); 如果你已经生成了 Spark 
assembly jar 并上传到了 HDFS, 那么你需要重新打包上传。在这之后，重试失败的 cube 任务，应该就可以成功了。相关的 JIRA issue 
是 KYLIN-3607，会在未来版本修复.
+
 ## 进一步
 
 如果您是 Kylin 的管理员但是对于 Spark 是新手，建议您浏览 [Spark 
文档](https://spark.apache.org/docs/2.1.2/)，别忘记相应地去更新配置。您可以开启 Spark 的 [Dynamic 
Resource 
Allocation](https://spark.apache.org/docs/2.1.2/job-scheduling.html#dynamic-resource-allocation)
 ，以便其对于不同的工作负载能自动伸缩。Spark 性能依赖于集群的内存和 CPU 资源，当有复杂数据模型和巨大的数据集一次构建时 Kylin 的 Cube 
构建将会是一项繁重的任务。如果您的集群资源不能够执行，Spark executors 就会抛出如 "OutOfMemorry" 
这样的错误，因此请合理的使用。对于有 UHC dimension，过多组合 (例如，一个 cube 超过 12 dimensions)，或耗尽内存的度量 
(Count Distinct，Top-N) 的 Cube，建议您使用 MapReduce e [...]
diff --git a/website/_docs/tutorial/cube_spark.md 
b/website/_docs/tutorial/cube_spark.md
index 9cfe366..2ac27d7 100644
--- a/website/_docs/tutorial/cube_spark.md
+++ b/website/_docs/tutorial/cube_spark.md
@@ -29,7 +29,7 @@ To run Spark on Yarn, need specify **HADOOP_CONF_DIR** 
environment variable, whi
 
 ## Check Spark configuration
 
-Kylin embeds a Spark binary (v2.1.0) in $KYLIN_HOME/spark, all the Spark 
configurations can be managed in $KYLIN_HOME/conf/kylin.properties with prefix 
*"kylin.engine.spark-conf."*. These properties will be extracted and applied 
when runs submit Spark job; E.g, if you configure 
"kylin.engine.spark-conf.spark.executor.memory=4G", Kylin will use "--conf 
spark.executor.memory=4G" as parameter when execute "spark-submit".
+Kylin embeds a Spark binary (Spark v2.1 for Kylin 2.4 and 2.5) in 
$KYLIN_HOME/spark, all the Spark configurations can be managed in 
$KYLIN_HOME/conf/kylin.properties with prefix *"kylin.engine.spark-conf."*. 
These properties will be extracted and applied when runs submit Spark job; E.g, 
if you configure "kylin.engine.spark-conf.spark.executor.memory=4G", Kylin will 
use "--conf spark.executor.memory=4G" as parameter when execute "spark-submit".
 
 Before you run Spark cubing, suggest take a look on these configurations and 
do customization according to your cluster. Below is the recommended 
configurations:
 
@@ -152,6 +152,24 @@ In web browser, access "http://sandbox:18080"; it shows the 
job history:
 
 Click a specific job, there you will see the detail runtime information, that 
is very helpful for trouble shooting and performance tuning.
 
+On some Hadoop release, you may encounter the following error in the "Convert 
Cuboid Data to HFile" step:
+
+{% highlight Groff markup %}
+Caused by: java.lang.RuntimeException: Could not create  interface 
org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the 
hadoop compatibility jar on the classpath?
+       at 
org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:73)
+       at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:31)
+       at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:192)
+       ... 15 more
+Caused by: java.util.NoSuchElementException
+       at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:365)
+       at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
+       at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
+       at 
org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:59)
+       ... 17 more
+{% endhighlight %}
+
+The workaround is: add `hbase-hadoop2-compat-*.jar` and 
`hbase-hadoop-compat-*.jar` into `$KYLIN_HOME/spark/jars` (the two jar files 
can be found in HBase's lib folder); If you already make the Spark assembly jar 
and uploaded to HDFS, you may need to re-package that and re-upload to HDFS. 
After that, resume the failed job, the job should be succesful. The related 
issue is KYLIN-3607 which will be fixed in later version.
+
 ## Go further
 
 If you're a Kylin administrator but new to Spark, suggest you go through 
[Spark documents](https://spark.apache.org/docs/2.1.0/), and don't forget to 
update the configurations accordingly. You can enable Spark [Dynamic Resource 
Allocation](https://spark.apache.org/docs/2.1.0/job-scheduling.html#dynamic-resource-allocation)
 so that it can auto scale/shrink for different work load. Spark's performance 
relies on Cluster's memory and CPU resource, while Kylin's Cube build is a 
heavy task whe [...]

[kylin] 02/02: Update cube_spark document with KYLIN-3607

Reply via email to