shaofengshi closed pull request #329: add english version of installation and
format the chapter of install…
URL: https://github.com/apache/kylin/pull/329
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/website/_data/docs.yml b/website/_data/docs.yml
index 49fc11e9f7..bd33d5f48a 100644
--- a/website/_data/docs.yml
+++ b/website/_data/docs.yml
@@ -30,7 +30,6 @@
- install/index
- install/kylin_cluster
- install/configuration
- - install/advance_settings
- install/kylin_aws_emr
- install/kylin_docker
diff --git a/website/_dev/doc_spec.cn.md b/website/_dev/doc_spec.cn.md
index 5fd87ae304..1089cd331a 100644
--- a/website/_dev/doc_spec.cn.md
+++ b/website/_dev/doc_spec.cn.md
@@ -49,7 +49,7 @@ permalink: /cn/development/doc_spec.html
使用代码段标记**所有用户需要执行的 shell 命令和 config 配置**,统一格式且需要足够凸显。如:
1. shell 命令
- \`\`\`shell
+ \`\`\`sh
$KYLIN_HOME/bin/kylin.sh start
\`\`\`
diff --git a/website/_docs/install/configuration.cn.md
b/website/_docs/install/configuration.cn.md
index c7bd63dedb..558b97fef2 100644
--- a/website/_docs/install/configuration.cn.md
+++ b/website/_docs/install/configuration.cn.md
@@ -9,12 +9,12 @@ permalink: /cn/docs/install/configuration.html
- [配置文件及参数重写](#kylin-config)
- [Kylin 配置文件](#kylin-config)
- - [配置重写](#config-overwrite)
- - [项目配置重写](#project-config-overwrite)
- - [Cube 配置重写](#cube-config-overwrite)
- - [重写 MapReduce 任务相关](#mr-config-overwrite)
- - [重写 Hive 参数](#hive-config-overwrite)
- - [重写 Spark 参数](#spark-config-overwrite)
+ - [配置重写](#config-override)
+ - [项目配置重写](#project-config-override)
+ - [Cube 配置重写](#cube-config-override)
+ - [重写 MapReduce 参数](#mr-config-override)
+ - [重写 Hive 参数](#hive-config-override)
+ - [重写 Spark 参数](#spark-config-override)
- [部署配置](#kylin-deploy)
- [部署 Kylin](#deploy-config)
- [任务引擎高可用](#job-engine-ha)
@@ -22,7 +22,7 @@ permalink: /cn/docs/install/configuration.html
- [RESTful Webservice](#rest-config)
- [Metastore 配置](#kylin_metastore)
- [元数据](#metadata)
- - [使用 MySQL 作为 Metastore (测试)](#mysql-metastore)
+ - [基于 MySQL 的 Metastore (测试)](#mysql-metastore)
- [构建配置](#kylin-build)
- [Hive 客户端 & SparkSQL](#hive-client-and-sparksql)
- [配置 JDBC 数据源](#jdbc-datasource)
@@ -50,10 +50,10 @@ permalink: /cn/docs/install/configuration.html
- [查询下压](#query-pushdown)
- [查询改写](#convert-sql)
- [收集查询指标到 JMX](#jmx-metrics)
- - [收集查询指标到dropwizard](#dropwizard-metrics)
+ - [收集查询指标到 dropwizard](#dropwizard-metrics)
- [安全配置](#kylin-security)
- [集成 LDAP 实现单点登录](#ldap-sso)
- - [与 Apache Ranger 集成](#ranger)
+ - [集成 Apache Ranger](#ranger)
- [启用 ZooKeeper ACL](#zookeeper-acl)
@@ -80,47 +80,47 @@ Kylin 的配置文件如下:
-### 配置重写 {#config-overwrite}
+### 配置重写 {#config-override}
`$KYLIN_HOME/conf/` 中有部分配置项可以在 Web UI 界面进行重写,配置重写分为**项目级别配置重写**和 **Cube
级别配置重写**。配置重写的优先级关系为:Cube 级别配置重写 > 项目级别配置重写 > 全局配置文件。
-### 项目配置重写 {#project-config-overwrite}
+### 项目配置重写 {#project-config-override}
-在 Web UI 界面点击 “**Manage Project**” ,选中某个项目,点击 “**Edit**”->“**Project
Config**”->”**+ Property**“,进行项目级别的配置重写,如下图所示:
-
+在 Web UI 界面点击 **Manage Project** ,选中某个项目,点击 **Edit** -> **Project Config** ->
**+ Property**,进行项目级别的配置重写,如下图所示:
+
-### Cube 配置重写 {#cube-config-overwrite}
+### Cube 配置重写 {#cube-config-override}
-在设计 Cube (**Cube Designer**)的 “**Configuration Overwrites**“ 步骤可以添加配置项,进行 Cube
级别的配置重写,如下图所示:
-
+在设计 Cube (**Cube Designer**)的 **Configuration overrides** 步骤可以添加配置项,进行 Cube
级别的配置重写,如下图所示:
+
-### 重写 MapReduce 任务相关 {#mr-config-overwrite}
+### 重写 MapReduce 任务相关 {#mr-config-override}
Kylin 支持在项目和 Cube 级别重写 `kylin_job_conf.xml` 和 `kylin_job_conf_inmem.xml`
中参数,以键值对的性质,按照如下格式替换:
`kylin.job.mr.config.override.<key> = <value>`
-如果您希望 Cube 的构建任务使用不同的 YARN resource
queue,您可以设置:`kylin.engine.mr.config-override.mapreduce.job.queuename={queueName}`
+如果用户希望 Cube 的构建任务使用不同的 YARN resource
queue,可以设置:`kylin.engine.mr.config-override.mapreduce.job.queuename={queueName}`
-### 重写 Hive 参数 {#hive-config-overwrite}
+### 重写 Hive 参数 {#hive-config-override}
Kylin 支持在项目和 Cube 级别重写 `kylin_hive_conf.xml` 中参数,以键值对的性质,按照如下格式替换:
`kylin.source.hive.config-override.<key> = <value>`
-如果您希望 Hive 使用不同的 YARN resource
queue,您可以设置:`kylin.source.hive.config-override.mapreduce.job.queuename={queueName}`
+如果用户希望 Hive 使用不同的 YARN resource
queue,可以设置:`kylin.source.hive.config-override.mapreduce.job.queuename={queueName}`
-### 重写 Spark 参数 {#spark-config-overwrite}
+### 重写 Spark 参数 {#spark-config-override}
Kylin 支持在项目和 Cube 级别重写 `kylin.properties` 中的 Spark 参数,以键值对的性质,按照如下格式替换:
`kylin.engine.spark-conf.<key> = <value>`
-如果您希望 Spark 使用不同的 YARN resource
queue,您可以设置:`kylin.engine.spark-conf.spark.yarn.queue={queueName}`
+如果用户希望 Spark 使用不同的 YARN resource
queue,可以设置:`kylin.engine.spark-conf.spark.yarn.queue={queueName}`
@@ -132,12 +132,12 @@ Kylin 支持在项目和 Cube 级别重写 `kylin.properties` 中的 Spark 参
### 部署 Kylin {#deploy-config}
-- `kylin.env`:指定 Kylin 部署的用途,参数值可选 `DEV`,`QA` 或 `PROD`,默认值为 `DEV`,在 DEV
模式下一些开发者功能将被启用
- `kylin.env.hdfs-working-dir`:指定 Kylin 服务所用的 HDFS 路径,默认值为 `/kylin`,请确保启动
Kylin 实例的用户有读写该目录的权限
+- `kylin.env`:指定 Kylin 部署的用途,参数值可选 (`DEV` | `QA` | `PROD`),默认值为 `DEV`,在 DEV
模式下一些开发者功能将被启用
- `kylin.env.zookeeper-base-path`:指定 Kylin 服务所用的 ZooKeeper 路径,默认值为 `/kylin`
- `kylin.env.zookeeper-connect-string`:指定 ZooKeeper 连接字符串,如果为空,使用 HBase 的
ZooKeeper
- `kylin.env.hadoop-conf-dir`:指定 Hadoop 配置文件目录,如果不指定的话,获取环境中的 `HADOOP_CONF_DIR`
-- `kylin.server.mode`:指定 Kylin 实例的运行模式,参数值可选 `all`,`job`,`query`,默认值为
`all`,job 模式代表该服务仅用于任务调度,不用于查询;query 模式代表该服务仅用于查询,不用于构建任务的调度;all
模式代表该服务同时用于任务调度和 SQL 查询。
+- `kylin.server.mode`:指定 Kylin 实例的运行模式,参数值可选 (`all` | `job` | `query`),默认值为
`all`,job 模式代表该服务仅用于任务调度,不用于查询;query 模式代表该服务仅用于查询,不用于构建任务的调度;all
模式代表该服务同时用于任务调度和 SQL 查询。
- `kylin.server.cluster-name`:指定集群名称
@@ -189,17 +189,17 @@ export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K
-XX`MaxPermSize=512M -v
- `kylin.metadata.hbase-client-scanner-timeout-period`:表示 HBase 客户端发起一次 scan
操作的 RPC 调用至得到响应之间总的超时时间,默认值为 10000(ms)
- `kylin.metadata.hbase-rpc-timeout`:指定 HBase 执行 RPC 操作的超时时间,默认值为 5000(ms)
- `kylin.metadata.hbase-client-retries-number`:指定 HBase 重试次数,默认值为 1(次)
-- `kylin.metadata.resource-store-provider.jdbc`:指定 JDBC
使用的类,默认值为org.apache.kylin.common.persistence.JDBCResourceStore
+- `kylin.metadata.resource-store-provider.jdbc`:指定 JDBC 使用的类,默认值为
`org.apache.kylin.common.persistence.JDBCResourceStore`
-### 使用 MySQL 作为 Metastore (测试) {#mysql-metastore}
+### 基于 MySQL 的 Metastore (测试) {#mysql-metastore}
-> **注意**:该功能还在测试中,建议您谨慎使用。
+> **注意**:该功能还在测试中,建议用户谨慎使用。
- `kylin.metadata.url`:指定元数据路径
- `kylin.metadata.jdbc.dialect`:指定 JDBC 方言
-- `kylin.metadata.jdbc.json-always-small-cell`:默认值为 true
+- `kylin.metadata.jdbc.json-always-small-cell`:默认值为 TRUE
- `kylin.metadata.jdbc.small-cell-meta-size-warning-threshold`:默认值为 100(MB)
- `kylin.metadata.jdbc.small-cell-meta-size-error-threshold`:默认值为 1(GB)
- `kylin.metadata.jdbc.max-cell-size`:默认值为 1(MB)
@@ -218,7 +218,7 @@ export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K
-XX`MaxPermSize=512M -v
### Hive 客户端 & SparkSQL {#hive-client-and-sparksql}
- `kylin.source.hive.client`:指定 Hive 命令行类型,参数值可选 cli 或 beeline,默认值为 cli
-- `kylin.source.hive.beeline-shell`:指定 Beeline shell 的绝对路径,默认为 beeline
+- `kylin.source.hive.beeline-shell`:指定 Beeline shell 的绝对路径,默认值为 beeline
- `kylin.source.hive.beeline-params`:当使用 Beeline 做为 Hive 的 Client
工具时,需要配置此参数,以提供更多信息给 Beeline
- `kylin.source.hive.enable-sparksql-for-table-ops`:默认值为 FALSE,当使用 SparkSQL
时需要设置为 TRUE
- `kylin.source.hive.sparksql-beeline-shell`:当使用 SparkSQL Beeline 做为 Hive 的
Client 工具时,需要配置此参数为 /path/to/spark-client/bin/beeline
@@ -256,11 +256,11 @@ export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K
-XX`MaxPermSize=512M -v
### Cube 设置 {#cube-config}
- `kylin.cube.ignore-signature-inconsistency`:Cube desc 中的 signature 信息能保证
Cube 不被更改为损坏状态,默认值为 FALSE
-- `kylin.cube.aggrgroup.max-combination`:指定一个 Cube 的聚合组 Cuboid 上限,默认值为
32768,不建议修改,过大的 Cuboid 数会导致构建耗时和膨胀率都达到不可接受的程度
+- `kylin.cube.aggrgroup.max-combination`:指定一个 Cube 的聚合组 Cuboid 上限,默认值为 32768
- `kylin.cube.aggrgroup.is-mandatory-only-valid`:是否允许 Cube 只包含 Base
Cuboid,默认值为 FALSE,当使用 Spark Cubing 时需设置为 TRUE
- `kylin.cube.rowkey.max-size`:指定可以设置为 Rowkeys 的最大列数,默认值为 63
- `kylin.cube.allow-appear-in-multiple-projects`:是否允许一个 Cube 出现在多个项目中
-- `kylin.cube.gtscanrequest-serialization-level`:默认为 1
+- `kylin.cube.gtscanrequest-serialization-level`:默认值为 1
@@ -277,7 +277,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原
### Cube 构建算法 {#cube-algorithm}
-- `kylin.cube.algorithm`:指定 Cube 构建的算法,参数值可选 `auto`,`layer` 和 `inmem`, 默认值为
`auto`,即 Kylin 会通过采集数据动态地选择一个算法 (layer or inmem),如果您很了解 Kylin
和您的数据、集群,您可以直接设置您喜欢的算法
+- `kylin.cube.algorithm`:指定 Cube 构建的算法,参数值可选 `auto`,`layer` 和 `inmem`, 默认值为
`auto`,即 Kylin 会通过采集数据动态地选择一个算法 (layer or inmem),如果用户很了解 Kylin
和自身的数据、集群,可以直接设置喜欢的算法
- `kylin.cube.algorithm.layer-or-inmem-threshold`:默认值为 7
- `kylin.cube.algorithm.inmem-split-limit`:默认值为 500
- `kylin.cube.algorithm.inmem-concurrent-threads`:默认值为 1
@@ -293,7 +293,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原
### 维表快照 {#snapshot}
-- `kylin.snapshot.max-mb`:允许维表的快照大小的上限,默认值为 300 (M)
+- `kylin.snapshot.max-mb`:允许维表的快照大小的上限,默认值为 300(M)
- `kylin.snapshot.max-cache-entry`:缓存中最多可以存储的 snapshot 数量,默认值为 500
- `kylin.snapshot.ext.shard-mb`:设置存储维表快照的 HBase 分片大小,默认值为 500(M)
- `kylin.snapshot.ext.local.cache.path`:本地缓存路径,默认值为 lookup_cache
@@ -305,7 +305,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原
- `kylin.storage.default`:指定默认的构建引擎,默认值为 2,即 HBase
- `kylin.source.hive.keep-flat-table`:是否在构建完成后保留 Hive 中间表,默认值为 FALSE
-- `kylin.source.hive.database-for-flat-table`:指定存放 Hive 中间表的 Hive 数据库名字,默认为
default,请确保启动 Kylin 实例的用户有操作该数据库的权限
+- `kylin.source.hive.database-for-flat-table`:指定存放 Hive 中间表的 Hive 数据库名字,默认值为
default,请确保启动 Kylin 实例的用户有操作该数据库的权限
- `kylin.source.hive.flat-table-storage-format`:指定 Hive 中间表的存储格式,默认值为
SEQUENCEFILE
- `kylin.source.hive.flat-table-field-delimiter`:指定 Hive 中间表的分隔符,默认值为 \u001F
- `kylin.source.hive.redistribute-flat-table`:是否重分配 Hive 平表,默认值为 TRUE
@@ -315,8 +315,8 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原
- `kylin.engine.mr.lib-dir`:指定 MapReduce 任务所使用的 jar 包的路径
- `kylin.engine.mr.reduce-input-mb`:MapReduce 任务启动前会依据输入预估 Reducer
接收数据的总量,再除以该参数得出 Reducer 的数目,默认值为 500(MB)
- `kylin.engine.mr.reduce-count-ratio`:用于估算 Reducer 数目,默认值为 1.0
-- `kylin.engine.mr.min-reducer-number`:MapReduce 任务中 Reducer 数目的最小值,默认为 1
-- `kylin.engine.mr.max-reducer-number`:MapReduce 任务中 Reducer 数目的最大值,默认为 500
+- `kylin.engine.mr.min-reducer-number`:MapReduce 任务中 Reducer 数目的最小值,默认值为 1
+- `kylin.engine.mr.max-reducer-number`:MapReduce 任务中 Reducer 数目的最大值,默认值为 500
- `kylin.engine.mr.mapper-input-rows`:每个 Mapper 可以处理的行数,默认值为
1000000,如果将这个值调小,会起更多的 Mapper
- `kylin.engine.mr.max-cuboid-stats-calculator-number`:用于计算 Cube
统计数据的线程数量,默认值为 1
- `kylin.engine.mr.build-dict-in-reducer`:是否在构建任务 **Extract Fact Table
Distinct Columns** 的 Reduce 阶段构建字典,默认值为 `TRUE`
@@ -348,15 +348,12 @@ Cube 构建默认在 **Extract Fact Table Distinct Column** 这一步为每一
### Spark 构建引擎 {#spark-cubing}
-Kylin 支持使用 Spark 作为 Cube 的构建引擎,详情请参考 [用 Spark 构建
Cube](/docs/tutorial/cube_spark.html)。
-与 Spark Cubing 有关的参数如下:
-
- `kylin.engine.spark-conf.spark.master`:指定 Spark 运行模式,默认值为 `yarn`
- `kylin.engine.spark-conf.spark.submit.deployMode`:指定 Spark on YARN
的部署模式,默认值为 `cluster`
- `kylin.engine.spark-conf.spark.yarn.queue`:指定 Spark 资源队列,默认值为 `default`
- `kylin.engine.spark-conf.spark.driver.memory`:指定 Spark Driver 内存大小,默认值为 2G
- `kylin.engine.spark-conf.spark.executor.memory`:指定 Spark Executor 内存大小,默认值为
4G
-- `kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead`:指定 Spark
Executor 堆外内存大小,默认值为 1024(M)
+- `kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead`:指定 Spark
Executor 堆外内存大小,默认值为 1024(MB)
- `kylin.engine.spark-conf.spark.executor.cores`:指定单个 Spark Executor可用核心数,默认值为
1
- `kylin.engine.spark-conf.spark.network.timeout`:指定 Spark 网络超时时间,600
- `kylin.engine.spark-conf.spark.executor.instances`:指定一个 Application 拥有的
Spark Executor 数量,默认值为 1
@@ -371,14 +368,12 @@ Kylin 支持使用 Spark 作为 Cube 的构建引擎,详情请参考 [用 Spar
- `kylin.engine.spark-conf-mergedict.spark.executor.memory`:为合并字典申请更多的内存,默认值为
6G
- `kylin.engine.spark-conf-mergedict.spark.memory.fraction`:给系统预留的内存百分比,默认值为
0.2
+> 提示:更多信息请参考 [用 Spark 构建 Cube](/docs/tutorial/cube_spark.html)。
### Spark 资源动态分配 {#dynamic-allocation}
-Spark 资源动态分配的详细介绍请参考官方文档:[Dynamic Resource
Allocation](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation)。
-启用 Spark 资源动态分配,需要修改集群的资源管理器相关配置,会根据资源管理器的不同(YARN、Mesos 或
Standalone)有不同的配置方法,另外需要在 `kylin.properties` 中进行如下配置:
-
- `kylin.engine.spark-conf.spark.shuffle.service.enabled`:是否开启 shuffle service
- `kylin.engine.spark-conf.spark.dynamicAllocation.enabled`:是否启用 Spark 资源动态分配
- `kylin.engine.spark-conf.spark.dynamicAllocation.initialExecutors`:如果所有的
Executor 都移除了,重新请求启动时初始 Executor 数量
@@ -386,6 +381,8 @@ Spark 资源动态分配的详细介绍请参考官方文档:[Dynamic Resource
- `kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors`:最多申请的
Executor 数量
-
`kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout`:Executor
空闲时间超过设置的值后,除非有缓存数据,不然会被移除,默认值为 60(s)
+> 提示:更多信息请参考 [Dynamic Resource
Allocation](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation)。
+
### 任务相关 {#job-config}
@@ -408,7 +405,7 @@ Spark 资源动态分配的详细介绍请参考官方文档:[Dynamic Resource
- `kylin.job.notification-enabled`:是否在任务成功或者失败时进行邮件通知,默认值为 FALSE
- `kylin.job.notification-mail-enable-starttls`:# 是否启用 starttls,默认值为 FALSE
- `kylin.job.notification-mail-host`:指定邮件的 SMTP 服务器地址
-- `kylin.job.notification-mail-port`:指定邮件的 SMTP 服务器端口,默认为 25
+- `kylin.job.notification-mail-port`:指定邮件的 SMTP 服务器端口,默认值为 25
- `kylin.job.notification-mail-username`:指定邮件的登录用户名
- `kylin.job.notification-mail-password`:指定邮件的用户名密码
- `kylin.job.notification-mail-sender`:指定邮件的发送邮箱地址
@@ -471,8 +468,9 @@ Kylin 可以使用三种类型的压缩,分别是 HBase 表压缩,Hive 输
* HBase 表压缩
-该项压缩通过 `kyiln.properties` 中的 `kylin.hbase.default.compression.codec`
进行配置,参数值可选 `none`,`snappy`,`lzo`,`gzip` 和 `lz4`,默认值为 `none`,即不压缩数据。
-> **注意**:在修改压缩算法前,请确保您的 HBase 集群支持所选压缩算法。
+该项压缩通过 `kyiln.properties` 中的 `kylin.hbase.default.compression.codec`
进行配置,参数值可选( `none` | `snappy` | `lzo` | `gzip` | `lz4`),默认值为 `none`,即不压缩数据。
+
+> **注意**:在修改压缩算法前,请确保用户的 HBase 集群支持所选压缩算法。
* Hive 输出压缩
@@ -494,7 +492,7 @@ Kylin 可以使用三种类型的压缩,分别是 HBase 表压缩,Hive 输
* MapReduce 任务输出压缩
-该项压缩通过 `kylin_job_conf.xml` 和 `kylin_job_conf_inmem.xml` 进行配置。默认为空,即使用
MapReduce 的默认配置。如果想重写配置,请在 `kylin_job_conf.xml` 和 `kylin_job_conf_inmem.xml`
中添加 (或替换) 下列属性。以 SNAPPY 压缩为例:
+该项压缩通过 `kylin_job_conf.xml` 和 `kylin_job_conf_inmem.xml` 进行配置。默认值为空,即使用
MapReduce 的默认配置。如果想重写配置,请在 `kylin_job_conf.xml` 和 `kylin_job_conf_inmem.xml`
中添加 (或替换) 下列属性。以 SNAPPY 压缩为例:
```xml
<property>
@@ -525,8 +523,8 @@ Kylin 可以使用三种类型的压缩,分别是 HBase 表压缩,Hive 输
- `kylin.query.security.table-acl-enabled`:是否在查询时检查对应表的 ACL,默认值为 TRUE
- `kylin.query.calcite.extras-props.conformance`:是否严格解析,默认值为 LENIENT
- `kylin.query.calcite.extras-props.caseSensitive`:是否大小写敏感,默认值为 TRUE
-- `kylin.query.calcite.extras-props.unquotedCasing`:是否需要将查询语句进行大小写转换,参数值可选(
UNCHANGED|TO_UPPER|TO_LOWER ),默认值为 TO_UPPER,即全部大写
-- `kylin.query.calcite.extras-props.quoting`:是否添加引号,参数值可选(
DOUBLE_QUOTE|BACK_TICK|BRACKET),默认值为 DOUBLE_QUOTE
+- `kylin.query.calcite.extras-props.unquotedCasing`:是否需要将查询语句进行大小写转换,参数值可选(
`UNCHANGED` | `TO_UPPER` | `TO_LOWER` ),默认值为 `TO_UPPER`,即全部大写
+- `kylin.query.calcite.extras-props.quoting`:是否添加引号,参数值可选( `DOUBLE_QUOTE` |
`BACK_TICK` | `BRACKET`),默认值为 `DOUBLE_QUOTE`
- `kylin.query.statement-cache-max-num`:缓存的 PreparedStatement 的最大条数,默认值为 50000
- `kylin.query.statement-cache-max-num-per-key`:每个键缓存的 PreparedStatement
的最大条数,默认值为 50
- `kylin.query.enable-dict-enumerator`:是否启用字典枚举器,默认值为 FALSE
@@ -553,7 +551,7 @@ Kylin 可以使用三种类型的压缩,分别是 HBase 表压缩,Hive 输
### 查询限制 {#query-limit}
-- `kylin.query.timeout-seconds`:设置查询超时时间,如果设置的值小于 60,会被强制替换成 60 秒
+- `kylin.query.timeout-seconds`:设置查询超时时间,默认值为 0,即没有限制,如果设置的值小于 60,会被强制替换成 60 秒
- `kylin.query.timeout-seconds-coefficient`:设置查询超时秒数的系数,默认值为 0.5
- `kylin.query.max-scan-bytes`:设置查询扫描字节的上限,默认值为 0,即没有限制
- `kylin.storage.partition.max-scan-bytes`:设置查询扫描的最大字节数,默认值为
3221225472(bytes),即 3GB
diff --git a/website/_docs/install/configuration.md
b/website/_docs/install/configuration.md
index 0fe21625b1..b95d3816d5 100644
--- a/website/_docs/install/configuration.md
+++ b/website/_docs/install/configuration.md
@@ -5,252 +5,630 @@ categories: install
permalink: /docs/install/configuration.html
---
-Kylin detects Hadoop/Hive/HBase configurations from the environments
automatically, for example the "core-site.xml", the "hbase-site.xml" and
others. Besides, Kylin has its own configurations, managed in the "conf" folder.
-
-{% highlight Groff markup %}
--bash-4.1# ls -l $KYLIN_HOME/conf
-
-kylin_hive_conf.xml
-kylin_job_conf_inmem.xml
-kylin_job_conf.xml
-kylin-kafka-consumer.xml
-kylin.properties
-kylin-server-log4j.properties
-kylin-tools-log4j.properties
-setenv.sh
-{% endhighlight %}
-
-## kylin_hive_conf.xml
-
-The Hive configurations that Kylin applied when fetching data from Hive.
-
-## kylin_job_conf.xml and kylin_job_conf_inmem.xml
-
-Hadoop MR configurations when Kylin run MapReduce jobs. The
"kylin_job_conf_inmem.xml" one requests more memory for mapper, used for
Kylin's "In-mem cubing" job.
-
-## kylin-kafka-consumer.xml
-
-Kafka configurations when Kylin fetching data from Kafka brokers.
-
-
-## kylin-server-log4j.properties
-
-Kylin server's log configurations.
-
-## kylin-tools-log4j.properties
-
-Kylin command line's log configurations.
-
-## setenv.sh
-
-Shell script to set environment variables. It will be invoked in "kylin.sh"
and other scripts in "bin" folder. Typically, you can adjust the Kylin JVM heap
size here, and set "KAFKA_HOME" and other environment variables.
-
-## kylin.properties
-
-The main configuration file of Kylin.
-
-
-| Key | Default value
| Description | Overwritten at
Cube |
-| ----------------------------------------------------- | --------------------
| ------------------------------------------------------------ |
------------------------- |
-| kylin.env | Dev
| Whether this env is a Dev, QA, or Prod environment | No
|
-| kylin.env.hdfs-working-dir | /kylin
| Working directory on HDFS | No
|
-| kylin.env.zookeeper-base-path | /kylin
| Path on ZK | No
|
-| kylin.env.zookeeper-connect-string |
| ZK connection string; If blank, use HBase's ZK | No
|
-| kylin.env.zookeeper-acl-enabled | false
| | No
|
-| kylin.env.zookeeper.zk-auth | digest:ADMIN:KYLIN
| | No
|
-| kylin.env.zookeeper.zk-acl | world:anyone:rwcda
| | No
|
-| kylin.metadata.dimension-encoding-max-length | 256
| Max length for one dimension's encoding | Yes
|
-| kylin.metadata.url | kylin_metadata@hbase
| Kylin metadata storage | No
|
-| kylin.metadata.sync-retries | 3
| | No
|
-| kylin.metadata.sync-error-handler |
| | No
|
-| kylin.metadata.check-copy-on-write | false
| | No
|
-| kylin.metadata.hbase-client-scanner-timeout-period | 10000
| | No
|
-| kylin.metadata.hbase-rpc-timeout | 5000
| | No
|
-| kylin.metadata.hbase-client-retries-number | 1
| | No
|
-| kylin.metadata.jdbc.dialect | mysql
| clarify the type of dialect | Yes
|
-| kylin.metadata.resource-store-provider.jdbc |
org.apache.kylin.common.persistence.JDBCResourceStore| specify the class that
jdbc used| |
-| kylin.metadata.jdbc.json-always-small-cell | true
| | Yes
|
-| kylin.metadata.jdbc.small-cell-meta-size-warning-threshold| 100mb
| | Yes
|
-| kylin.metadata.jdbc.small-cell-meta-size-error-threshold| 1gb
| | Yes
|
-| kylin.metadata.jdbc.max-cell-size | 1mb
| | Yes
|
-| kylin.dictionary.use-forest-trie | true
| | No
|
-| kylin.dictionary.forest-trie-max-mb | 500
| | No
|
-| kylin.dictionary.max-cache-entry | 3000
| | No
|
-| kylin.dictionary.growing-enabled | false
| | No
|
-| kylin.dictionary.append-entry-size | 10000000
| | No
|
-| kylin.dictionary.append-max-versions | 3
| | No
|
-| kylin.dictionary.append-version-ttl | 259200000
| | No
|
-| kylin.dictionary.reusable | false
| Whether reuse dict | Yes
|
-| kylin.dictionary.shrunken-from-global-enabled | false
| Whether shrink global dict | Yes
|
-| kylin.snapshot.max-cache-entry | 500
| | No
|
-| kylin.snapshot.max-mb | 300
| | No
|
-| kylin.snapshot.ext.shard-mb | 500
| | No
|
-| kylin.snapshot.ext.local.cache.path | lookup_cache
| | No
|
-| kylin.snapshot.ext.local.cache.max-size-gb | 200
| | No
|
-| kylin.cube.size-estimate-ratio | 0.25
| | Yes
|
-| kylin.cube.size-estimate-memhungry-ratio | 0.05
| Deprecated | Yes
|
-| kylin.cube.size-estimate-countdistinct-ratio | 0.5
| | Yes
|
-| kylin.cube.size-estimate-topn-ratio | 0.5
| | Yes
|
-| kylin.cube.algorithm | auto
| Cubing algorithm for MR engine, other options: layer, inmem | Yes
|
-| kylin.cube.algorithm.layer-or-inmem-threshold | 7
| | Yes
|
-| kylin.cube.algorithm.inmem-split-limit | 500
| | Yes
|
-| kylin.cube.algorithm.inmem-concurrent-threads | 1
| | Yes
|
-| kylin.cube.ignore-signature-inconsistency | false
| |
|
-| kylin.cube.aggrgroup.max-combination | 32768
| Max cuboid numbers in a Cube | Yes
|
-| kylin.cube.aggrgroup.is-mandatory-only-valid | false
| Whether allow a Cube only has the base cuboid. | Yes
|
-| kylin.cube.cubeplanner.enabled | true
| Whether enable cubeplanner | Yes
|
-| kylin.cube.cubeplanner.enabled-for-existing-cube | true
| Whether enable cubeplanner for existing cube | Yes
|
-| kylin.cube.cubeplanner.algorithm-threshold-greedy | 8
| | Yes
|
-| kylin.cube.cubeplanner.expansion-threshold | 15.0
| | Yes
|
-| kylin.cube.cubeplanner.recommend-cache-max-size | 200
| | No
|
-| kylin.cube.cubeplanner.mandatory-rollup-threshold | 1000
| | Yes
|
-| kylin.cube.cubeplanner.algorithm-threshold-genetic | 23
| | Yes
|
-| kylin.cube.rowkey.max-size | 63
| Max columns in Rowkey | No
|
-| kylin.cube.max-building-segments | 10
| Max building segments in one Cube | Yes
|
-| kylin.cube.allow-appear-in-multiple-projects | false
| Whether allow a Cueb appeared in multiple projects | No
|
-| kylin.cube.gtscanrequest-serialization-level | 1
| |
|
-| kylin.cube.is-automerge-enabled | true
| Whether enable auto merge. | Yes
|
-| kylin.job.log-dir | /tmp/kylin/logs
| |
|
-| kylin.job.allow-empty-segment | true
| Whether tolerant data source is emtpy. | Yes
|
-| kylin.job.max-concurrent-jobs | 10
| Max concurrent running jobs | No
|
-| kylin.job.sampling-percentage | 100
| Data sampling percentage, to calculate Cube statistics; Default be all. | Yes
|
-| kylin.job.notification-enabled | false
| Whether send email notification on job error/succeed. | No
|
-| kylin.job.notification-mail-enable-starttls | false
| | No
|
-| kylin.job.notification-mail-port | 25
| | No
|
-| kylin.job.notification-mail-host |
| | No
|
-| kylin.job.notification-mail-username |
| | No
|
-| kylin.job.notification-mail-password |
| | No
|
-| kylin.job.notification-mail-sender |
| | No
|
-| kylin.job.notification-admin-emails |
| | No
|
-| kylin.job.retry | 0
| | No
|
-| |
| |
|
-| kylin.job.scheduler.priority-considered | false
| | No
|
-| kylin.job.scheduler.priority-bar-fetch-from-queue | 20
| | No
|
-| kylin.job.scheduler.poll-interval-second | 30
| | No
|
-| kylin.job.error-record-threshold | 0
| | No
|
-| kylin.job.cube-auto-ready-enabled | true
| Whether enable the cube automatically when finish build | Yes
|
-| kylin.source.hive.keep-flat-table | false
| Whether keep the intermediate Hive table after job finished. | No
|
-| kylin.source.hive.database-for-flat-table | default
| Hive database to create the intermediate table. | No
|
-| kylin.source.hive.flat-table-storage-format | SEQUENCEFILE
| | No
|
-| kylin.source.hive.flat-table-field-delimiter | \u001F
| | No
|
-| kylin.source.hive.redistribute-flat-table | true
| Whether or not to redistribute the flat table. | Yes
|
-| kylin.source.hive.redistribute-column-count | 3
| The number of redistribute column | Yes
|
-| kylin.source.hive.client | cli
| | No
|
-| kylin.source.hive.beeline-shell | beeline
| | No
|
-| kylin.source.hive.beeline-params |
| | No
|
-| kylin.source.hive.enable-sparksql-for-table-ops | false
| | No
|
-| kylin.source.hive.sparksql-beeline-shell |
| | No
|
-| kylin.source.hive.sparksql-beeline-params |
| | No
|
-| kylin.source.hive.table-dir-create-first | false
| | No
|
-| kylin.source.hive.flat-table-cluster-by-dict-column |
| |
|
-| kylin.source.hive.default-varchar-precision | 256
| | No
|
-| kylin.source.hive.default-char-precision | 255
| | No
|
-| kylin.source.hive.default-decimal-precision | 19
| | No
|
-| kylin.source.hive.default-decimal-scale | 4
| | No
|
-| kylin.source.jdbc.connection-url |
| |
|
-| kylin.source.jdbc.driver |
| |
|
-| kylin.source.jdbc.dialect | default
| |
|
-| kylin.source.jdbc.user |
| |
|
-| kylin.source.jdbc.pass |
| |
|
-| kylin.source.jdbc.sqoop-home |
| |
|
-| kylin.source.jdbc.sqoop-mapper-num | 4
| |
|
-| kylin.source.jdbc.field-delimiter | \|
| |
|
-| kylin.storage.default | 2
| | No
|
-| kylin.storage.hbase.table-name-prefix | KYLIN_
| | No
|
-| kylin.storage.hbase.namespace | default
| | No
|
-| kylin.storage.hbase.cluster-fs |
| |
|
-| kylin.storage.hbase.cluster-hdfs-config-file |
| |
|
-| kylin.storage.hbase.coprocessor-local-jar |
| |
|
-| kylin.storage.hbase.min-region-count | 1
| |
|
-| kylin.storage.hbase.max-region-count | 500
| |
|
-| kylin.storage.hbase.hfile-size-gb | 2.0
| |
|
-| kylin.storage.hbase.run-local-coprocessor | false
| |
|
-| kylin.storage.hbase.coprocessor-mem-gb | 3.0
| |
|
-| kylin.storage.partition.aggr-spill-enabled | true
| |
|
-| kylin.storage.partition.max-scan-bytes | 3221225472
| |
|
-| kylin.storage.hbase.coprocessor-timeout-seconds | 0
| |
|
-| kylin.storage.hbase.max-fuzzykey-scan | 200
| |
|
-| kylin.storage.hbase.max-fuzzykey-scan-split | 1
| |
|
-| kylin.storage.hbase.max-visit-scanrange | 1000000
| |
|
-| kylin.storage.hbase.scan-cache-rows | 1024
| |
|
-| kylin.storage.hbase.region-cut-gb | 5.0
| |
|
-| kylin.storage.hbase.max-scan-result-bytes | 5242880
| |
|
-| kylin.storage.hbase.compression-codec | none
| |
|
-| kylin.storage.hbase.rowkey-encoding | FAST_DIFF
| |
|
-| kylin.storage.hbase.block-size-bytes | 1048576
| |
|
-| kylin.storage.hbase.small-family-block-size-bytes | 65536
| |
|
-| kylin.storage.hbase.owner-tag |
| |
|
-| kylin.storage.hbase.endpoint-compress-result | true
| |
|
-| kylin.storage.hbase.max-hconnection-threads | 2048
| |
|
-| kylin.storage.hbase.core-hconnection-threads | 2048
| |
|
-| kylin.storage.hbase.hconnection-threads-alive-seconds | 60
| |
|
-| kylin.storage.hbase.replication-scope | 0
| whether config hbase cluster replication | Yes
|
-| kylin.engine.mr.lib-dir |
| |
|
-| kylin.engine.mr.reduce-input-mb | 500
| |
|
-| kylin.engine.mr.reduce-count-ratio | 1.0
| |
|
-| kylin.engine.mr.min-reducer-number | 1
| |
|
-| kylin.engine.mr.max-reducer-number | 500
| |
|
-| kylin.engine.mr.mapper-input-rows | 1000000
| |
|
-| kylin.engine.mr.max-cuboid-stats-calculator-number | 1
| |
|
-| kylin.engine.mr.uhc-reducer-count | 1
| |
|
-| kylin.engine.mr.build-uhc-dict-in-additional-step | false
| |
|
-| kylin.engine.mr.build-dict-in-reducer | true
| |
|
-| kylin.engine.mr.yarn-check-interval-seconds | 10
| |
|
-| kylin.env.hadoop-conf-dir |
| Hadoop conf directory; If not specified, parse from environment. | No
|
-| kylin.engine.spark.rdd-partition-cut-mb | 10.0
| Spark Cubing RDD partition split size. | Yes
|
-| kylin.engine.spark.min-partition | 1
| Spark Cubing RDD min partition number | Yes
|
-| kylin.engine.spark.max-partition | 5000
| RDD max partition number | Yes
|
-| kylin.engine.spark.storage-level | MEMORY_AND_DISK_SER
| RDD persistent level. | Yes
|
-| kylin.engine.spark-conf.spark.hadoop.dfs.replication | 2
| |
|
-|
kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress|
true| |
|
-|
kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec
| org.apache.hadoop.io.compress.DefaultCodec| |
|
-| kylin.engine.spark-conf-mergedict.spark.executor.memory| 6G
| | Yes
|
-| kylin.engine.spark-conf-mergedict.spark.memory.fraction| 0.2
| | Yes
|
-| kylin.query.skip-empty-segments | true
| Whether directly skip empty segment (metadata shows size be 0) when run SQL
query. | Yes |
-| kylin.query.force-limit | -1
| |
|
-| kylin.query.max-scan-bytes | 0
| |
|
-| kylin.query.max-return-rows | 5000000
| |
|
-| kylin.query.large-query-threshold | 1000000
| |
|
-| kylin.query.cache-threshold-duration | 2000
| |
|
-| kylin.query.cache-threshold-scan-count | 10240
| |
|
-| kylin.query.cache-threshold-scan-bytes | 1048576
| |
|
-| kylin.query.security-enabled | true
| |
|
-| kylin.query.cache-enabled | true
| |
|
-| kylin.query.timeout-seconds | 0
| |
|
-| kylin.query.timeout-seconds-coefficient | 0.5
| the coefficient to control query timeout seconds | Yes
|
-| kylin.query.pushdown.runner-class-name |
| |
|
-| kylin.query.pushdown.update-enabled | false
| |
|
-| kylin.query.pushdown.cache-enabled | false
| |
|
-| kylin.query.pushdown.jdbc.url |
| |
|
-| kylin.query.pushdown.jdbc.driver |
| |
|
-| kylin.query.pushdown.jdbc.username |
| |
|
-| kylin.query.pushdown.jdbc.password |
| |
|
-| kylin.query.pushdown.jdbc.pool-max-total | 8
| |
|
-| kylin.query.pushdown.jdbc.pool-max-idle | 8
| |
|
-| kylin.query.pushdown.jdbc.pool-min-idle | 0
| |
|
-| kylin.query.security.table-acl-enabled | true
| | No
|
-| kylin.query.calcite.extras-props.conformance | LENIENT
| | Yes
|
-| kylin.query.calcite.extras-props.caseSensitive | true
| Whether enable case sensitive | Yes
|
-| kylin.query.calcite.extras-props.unquotedCasing | TO_UPPER
| Options: UNCHANGED, TO_UPPER, TO_LOWER | Yes
|
-| kylin.query.calcite.extras-props.quoting | DOUBLE_QUOTE
| Options: DOUBLE_QUOTE, BACK_TICK, BRACKET | Yes
|
-| kylin.query.statement-cache-max-num | 50000
| Max number for cache query statement | Yes
|
-| kylin.query.statement-cache-max-num-per-key | 50
| | Yes
|
-| kylin.query.enable-dict-enumerator | false
| Whether enable dict enumerator | Yes
|
-| kylin.query.enable-dynamic-column | false
| | No
|
-| kylin.server.mode | all
| Kylin node mode: all\|job\|query. | No
|
-| kylin.server.cluster-servers | localhost:7070
| | No
|
-| kylin.server.cluster-name |
| | No
|
-| kylin.server.query-metrics-enabled | false
| | No
|
-| kylin.server.query-metrics2-enabled | false
| | No
|
-| kylin.server.auth-user-cache.expire-seconds | 300
| | No
|
-| kylin.server.auth-user-cache.max-entries | 100
| | No
|
-| kylin.server.external-acl-provider |
| | No
|
-| kylin.security.ldap.user-search-base |
| | No
|
-| kylin.security.ldap.user-group-search-base |
| | No
|
-| kylin.security.acl.admin-role |
| | No
|
-| kylin.web.timezone | PST
| | No
|
-| kylin.web.cross-domain-enabled | true
| | No
|
-| kylin.web.export-allow-admin | true
| | No
|
-| kylin.web.export-allow-other | true
| | No
|
-| kylin.web.dashboard-enabled | false
| | No
|
+- [Configuration Files and Overriding](#kylin-config)
+ - [Kylin Configuration Files](#kylin-config)
+ - [Configuration Overriding](#config-override)
+ - [Project-level Configuration
Overriding](#project-config-override)
+ - [Cube-level Configuration Overriding](#cube-config-override)
+ - [MapReduce Configuration Overriding](#mr-config-override)
+ - [Hive Configuration Overriding](#hive-config-override)
+ - [Spark Configuration Overriding](#spark-config-override)
+- [Deployment configuration](#kylin-deploy)
+ - [Deploy Kylin](#deploy-config)
+ - [Job Engine HA](#job-engine-ha)
+ - [Allocate More Memory for Kylin](#kylin-jvm-settings)
+ - [RESTful Webservice](#rest-config)
+- [Metastore Configuration](#kylin_metastore)
+ - [Metadata](#metadata)
+ - [MySQL Metastore Configuration (Beta)](#mysql-metastore)
+- [Modeling Configuration](#kylin-build)
+ - [Hive Client and SparkSQL](#hive-client-and-sparksql)
+ - [JDBC Datasource Configuration](#jdbc-datasource)
+ - [Data Type Precision](#precision-config)
+ - [Cube Design](#cube-config)
+ - [Cube Size Estimation](#cube-estimate)
+ - [Cube Algorithm](#cube-algorithm)
+ - [Auto Merge Segments](#auto-merge)
+ - [Lookup Table Snapshot](#snapshot)
+ - [Build Cube](#cube-build)
+ - [Dictionary-related](#dict-config)
+ - [Deal with Ultra-High-Cardinality Columns](#uhc-config)
+ - [Spark as Build Engine](#spark-cubing)
+ - [Spark Dynamic Allocation](#dynamic-allocation)
+ - [Job-related](#job-config)
+ - [Enable Email Notification](#email-notification)
+ - [Enable Cube Planner](#cube-planner)
+ - [HBase Storage](#hbase-config)
+ - [Enable Compression](#compress-config)
+- [Query Configuration](#kylin-query)
+ - [Query-related](#query-config)
+ - [Fuzzy Query](#fuzzy)
+ - [Query Cache](#cache-config)
+ - [Query Limits](#query-limit)
+ - [Query Pushdown](#query-pushdown)
+ - [Query rewriting](#convert-sql)
+ - [Collect Query Metrics to JMX](#jmx-metrics)
+ - [Collect Query Metrics to dropwizard](#dropwizard-metrics)
+- [Security Configuration](#kylin-security)
+ - [Integrated LDAP for SSO](#ldap-sso)
+ - [Integrate with Apache Ranger](#ranger)
+ - [Enable ZooKeeper ACL](#zookeeper-acl)
+
+
+
+
+### Configuration Files and Overriding {#kylin-config}
+
+This section introduces Kylin's configuration files and how to perform
Configuration Overriding.
+
+
+
+### Kylin Configuration Files {#kylin-config-file}
+
+Kylin will automatically read the Hadoop configuration (`core-site.xml`), Hive
configuration (`hbase-site.xml`) and HBase configuration (`hbase-site.xml`)
from the environment, in addition, Kylin's configuration files are in the
`$KYLIN_HOME/conf/` directory.
+Kylin's configuration file is as follows:
+
+- `kylin_hive_conf.xml`: This file contains the configuration for the Hive job.
+- `kylin_job_conf.xml` & `kylin_job_conf_inmem.xml`: This file contains
configuration for the MapReduce job. When performing the *In-mem Cubing* job,
user need to request more memory for the mapper in `kylin_job_conf_inmem.xml`
+- `kylin-kafka-consumer.xml`: This file contains the configuration for the
Kafka job.
+- `kylin-server-log4j.properties`: This file contains the log configuration
for the Kylin server.
+- `kylin-tools-log4j.properties`: This file contains the log configuration for
the Kylin command line.
+- `setenv.sh` : This file is a shell script for setting environment variables.
Users can adjust the size of the Kylin JVM stack with `KYLIN_JVM_SETTINGS` and
set other environment variables such as `KAFKA_HOME`.
+- `kylin.properties`: This file contains Kylin global configuration.
+
+
+
+### Configuration Overriding {#config-override}
+
+Some configuration files in `$KYLIN_HOME/conf/` can be overridden in the Web
UI. Configuration Overriding has two scope: *Project level* and *Cube level*.
The priority order can be stated as: Cube level configurations > Project level
configurations > configuration files.
+
+
+
+### Project-level Configuration Overriding {#project-config-override}
+
+Click *Manage Project* in the web UI interface, select a project, click *Edit*
-> *Project Config* -> *+ Property* to add configuration properties which could
override property values in configuration files, as the figure below shown:
+
+
+
+
+### Cube-level Configuration Overriding {#cube-config-override}
+
+In the *Configuration overrides* step of *Cube Designer*, user could rewrite
property values to override those in project level and configuration files, as
the figure below shown:
+
+
+
+
+### MapReduce Configuration Overriding {#mr-config-override}
+
+Kylin supports overriding configuration properties in `kylin_job_conf.xml` and
`kylin_job_conf_inmem.xml` at the project and cube level, in the form of
key-value pairs, in the following format:
+`kylin.job.mr.config.override.<key> = <value>`
+If user wants the cube's build job to use a different YARN resource queue,
user can set:
`kylin.engine.mr.config-override.mapreduce.job.queuename={queueName}`
+
+
+
+### Hive Configuration Overriding {#hive-config-override}
+
+Kylin supports overriding configuration properties in `kylin_hive_conf.xml` at
the project and cube level, in the form of key-value pairs, in the following
format:
+`kylin.source.hive.config-override.<key> = <value>`
+If user wants Hive to use a different YARN resource queue, user can set:
`kylin.source.hive.config-override.mapreduce.job.queuename={queueName}`
+
+
+
+### Spark Configuration Overriding {#spark-config-override}
+
+Kylin supports overriding configuration properties in `kylin.properties` at
the project and cube level, in the form of key-value pairs, in the following
format:
+`kylin.engine.spark-conf.<key> = <value>`
+If user wants Spark to use a different YARN resource queue, user can set:
`kylin.engine.spark-conf.spark.yarn.queue={queueName}`
+
+
+
+### Deployment configuration {#kylin-deploy}
+
+This section introduces Kylin Deployment related configuration.
+
+
+
+### Deploy Kylin {#deploy-config}
+
+- `kylin.env.hdfs-working-dir`: specifies the HDFS path used by Kylin service.
The default value is `/kylin`. Make sure that the user who starts the Kylin
instance has permission to read and write to this directory.
+- `kylin.env`: specifies the purpose of the Kylin deployment. Optional values
include `DEV`, `QA` and `PROD`. The default value is *DEV*. Some developer
functions will be enabled in *DEV* mode.
+- `kylin.env.zookeeper-base-path`: specifies the ZooKeeper path used by the
Kylin service. The default value is `/kylin`
+- `kylin.env.zookeeper-connect-string`: specifies the ZooKeeper connection
string. If it is empty, use HBase's ZooKeeper
+- `kylin.env.hadoop-conf-dir`: specifies the Hadoop configuration file
directory. If not specified, get `HADOOP_CONF_DIR` in the environment.
+- `kylin.server.mode`: Optional values include `all`, `job` and `query`, among
them *all* is the default one. *job* mode means the instance schedules Cube job
only; *query* mode means the instance serves SQL queries only; *all* mode means
the instance handles both of them.
+- `kylin.server.cluster-name`: specifies the cluster name
+
+
+
+### Read and write separation configuration {#rw-deploy}
+
+- `kylin.storage.hbase.cluster-fs`: specifies the HDFS file system of the
HBase cluster
+- `kylin.storage.hbase.cluster-hdfs-config-file`: specifies HDFS configuration
file pointing to the HBase cluster
+
+> Tip: For more information, please refer to [Deploy Apache Kylin with
Standalone HBase
Cluster](http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/)
+
+
+
+### Allocate More Memory for Kylin {#kylin-jvm-settings}
+
+There are two sample settings for `KYLIN_JVM_SETTINGS` are given in
`$KYLIN_HOME/conf/setenv.sh`.
+The default setting use relatively less memory. You can comment it and then
uncomment the next line to allocate more memory for Kyligence Enterprise. The
default configuration is:
+
+```shell
+Export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K -XX`MaxPermSize=512M
-verbose`gc -XX`+PrintGCDetails -XX`+PrintGCDateStamps
-Xloggc`$KYLIN_HOME/logs/kylin.gc.$$ -XX`+UseGCLogFileRotation -
XX`NumberOfGCLogFiles=10 -XX`GCLogFileSize=64M"
+# export KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX`MaxPermSize=512m
-XX`NewSize=3g -XX`MaxNewSize=3g -XX`SurvivorRatio=4
-XX`+CMSClassUnloadingEnabled -XX`+CMSParallelRemarkEnabled
-XX`+UseConcMarkSweepGC -XX `+CMSIncrementalMode
-XX`CMSInitiatingOccupancyFraction=70 -XX`+UseCMSInitiatingOccupancyOnly
-XX`+DisableExplicitGC -XX`+HeapDumpOnOutOfMemoryError -verbose`gc
-XX`+PrintGCDetails -XX`+PrintGCDateStamps -Xloggc`$KYLIN_HOME/logs/kylin.gc.
$$ -XX`+UseGCLogFileRotation -XX`NumberOfGCLogFiles=10 -XX`GCLogFileSize=64M"
+```
+
+
+
+### RESTful Webservice {#rest-config}
+
+- `kylin.web.timezone`: specifies the time zone used by Kylin's REST service.
The default value is GMT+8.
+- `kylin.web.cross-domain-enabled`: whether cross-domain access is supported.
The default value is TRUE
+- `kylin.web.export-allow-admin`: whether to support administrator user export
information. The default value is TRUE
+- `kylin.web.export-allow-other`: whether to support other users to export
information. The default value is TRUE
+- `kylin.web.dashboard-enabled`: whether to enable Dashboard. The default
value is FALSE
+
+
+
+### Metastore Configuration {#kylin_metastore}
+
+This section introduces Kylin Metastore related configuration.
+
+
+
+### Metadata {#metadata}
+
+- `kylin.metadata.url`: specifies the Metadata path. The default value is
`kylin_metadata@hbase`
+- `kylin.metadata.dimension-encoding-max-length`: specifies the maximum length
when the dimension is used as Rowkeys with fix_length encoding. The default
value is 256.
+- `kylin.metadata.sync-retries`: specifies the number of Metadata sync
retries. The default value is 3.
+- `kylin.metadata.sync-error-handler`: The default value is
`DefaultSyncErrorHandler`
+- `kylin.metadata.check-copy-on-write`: whether clear metadata cache, default
value is `FALSE`
+- `kylin.metadata.hbase-client-scanner-timeout-period`: specifies the total
timeout between the RPC call initiated by the HBase client. The default value
is 10000 (ms).
+- `kylin.metadata.hbase-rpc-timeout`: specifies the timeout for HBase to
perform RPC operations. The default value is 5000 (ms).
+- `kylin.metadata.hbase-client-retries-number`: specifies the number of HBase
retries. The default value is 1 (times).
+- `kylin.metadata.resource-store-provider.jdbc`: specifies the class used by
JDBC. The default value is
`org.apache.kylin.common.persistence.JDBCResourceStore`
+
+
+
+### MySQL Metastore Configuration (Beta) {#mysql-metastore}
+
+> *Note*: This feature is still being tested and it is recommended to use it
with caution.
+
+- `kylin.metadata.url`: specifies the metadata path
+- `kylin.metadata.jdbc.dialect`: specifies JDBC dialect
+- `kylin.metadata.jdbc.json-always-small-cell`: The default value is TRUE
+- `kylin.metadata.jdbc.small-cell-meta-size-warning-threshold`: The default
value is 100 (MB)
+- `kylin.metadata.jdbc.small-cell-meta-size-error-threshold`: The default
value is 1 (GB)
+- `kylin.metadata.jdbc.max-cell-size`: The default value is 1 (MB)
+- `kylin.metadata.resource-store-provider.jdbc`: specifies the class used by
JDBC. The default value is org.apache.kylin.common.persistence.JDBCResourceStore
+
+> Tip: For more information, please refer to [MySQL-based Metastore
Configuration](/docs/tutorial/mysql_metastore.html)
+
+
+
+### Modeling Configuration {#kylin-build}
+
+This section introduces Kylin data modeling and build related configuration.
+
+
+
+### Hive Client and SparkSQL {#hive-client-and-sparksql}
+
+- `kylin.source.hive.client`: specifies the Hive command line type. Optional
values include *cli* or *beeline*. The default value is *cli*.
+- `kylin.source.hive.beeline-shell`: specifies the absolute path of the
Beeline shell. The default is beeline
+- `kylin.source.hive.beeline-params`: when using Beeline as the Client tool
for Hive, user need to configure this parameter to provide more information to
Beeline
+- `kylin.source.hive.enable-sparksql-for-table-ops`: the default value is
*FALSE*, which needs to be set to *TRUE* when using SparkSQL
+- `kylin.source.hive.sparksql-beeline-shell`: when using SparkSQL Beeline as
the client tool for Hive, user need to configure this parameter as
/path/to/spark-client/bin/beeline
+- `kylin.source.hive.sparksql-beeline-params`: when using SparkSQL Beeline as
the Client tool for Hive,user need to configure this parameter to provide more
information to SparkSQL
+
+
+
+
+### JDBC Datasource Configuration {#jdbc-datasource}
+
+- `kylin.source.default`: specifies the type of data source used by JDBC
+- `kylin.source.jdbc.connection-url`: specifies JDBC connection string
+- `kylin.source.jdbc.driver`: specifies JDBC driver class name
+- `kylin.source.jdbc.dialect`: specifies JDBC dialect. The default value is
default
+- `kylin.source.jdbc.user`: specifies JDBC connection username
+- `kylin.source.jdbc.pass`: specifies JDBC connection password
+- `kylin.source.jdbc.sqoop-home`: specifies Sqoop installation path
+- `kylin.source.jdbc.sqoop-mapper-num`: specifies how many slices should be
split. Sqoop will run a mapper for each slice. The default value is 4.
+- `kylin.source.jdbc.field-delimiter`: specifies the field separator. The
default value is \
+
+> Tip: For more information, please refer to [Building a JDBC Data
Source](/docs/tutorial/setup_jdbc_datasource.html).
+
+
+
+
+### Data Type Precision {#precision-config}
+
+- `kylin.source.hive.default-varchar-precision`: specifies the maximum length
of the *varchar* field. The default value is 256.
+- `kylin.source.hive.default-char-precision`: specifies the maximum length of
the *char* field. The default value is 255.
+- `kylin.source.hive.default-decimal-precision`: specifies the precision of
the *decimal* field. The default value is 19
+- `kylin.source.hive.default-decimal-scale`: specifies the scale of the
*decimal* field. The default value is 4.
+
+
+
+### Cube Design {#cube-config}
+
+- `kylin.cube.ignore-signature-inconsistency`:The signature in Cube desc
ensures that the cube is not changed to a corrupt state. The default value is
*FALSE*
+- `kylin.cube.aggrgroup.max-combination`: specifies the max combination number
of aggregation groups. The default value is 32768.
+- `kylin.cube.aggrgroup.is-mandatory-only-valid`: whether to allow Cube
contains only Base Cuboid. The default value is *FALSE*, set to *TRUE* when
using Spark Cubing
+- `kylin.cube.rowkey.max-size`: specifies the maximum number of columns that
can be set to Rowkeys. The default value is 63.
+- `kylin.cube.allow-appear-in-multiple-projects`: whether to allow a cube to
appear in multiple projects
+- `kylin.cube.gtscanrequest-serialization-level`: the default value is 1
+
+
+
+### Cube Size Estimation {#cube-estimate}
+
+Both Kylin and HBase use compression when writing to disk, so Kylin will
multiply its original size by the ratio to estimate the size of the cube.
+
+- `kylin.cube.size-estimate-ratio`: normal cube, default value is 0.25
+- `kylin.cube.size-estimate-memhungry-ratio`: Deprecated, default is 0.05
+- `kylin.cube.size-estimate-countdistinct-ratio`: Cube Size Estimation with
count distinct h= metric, default value is 0.5
+- `kylin.cube.size-estimate-topn-ratio`: Cube Size Estimation with TopN
metric, default value is 0.5
+
+
+
+### Cube Algorithm {#cube-algorithm}
+
+- `kylin.cube.algorithm`: specifies the algorithm of the Build Cube. Optional
values include `auto`, `layer` and `inmem`. The default value is `auto`, that
is, Kylin will dynamically select an algorithm by collecting data ( Layer or
inmem), if user knows Kylin, user data and cluster condition well, user can
directly set the algorithm.
+- `kylin.cube.algorithm.layer-or-inmem-threshold`: the default value is 7
+- `kylin.cube.algorithm.inmem-split-limit`: the default value is 500
+- `kylin.cube.algorithm.inmem-concurrent-threads`: the default value is 1
+- `kylin.job.sampling-percentage`: specifies the data sampling percentage. The
default value is 100.
+
+
+
+### Auto Merge Segments {#auto-merge}
+
+- `kylin.cube.is-automerge-enabled`: whether to enable auto-merge. The default
value is *TRUE*. When this parameter is set to *FALSE*, the auto-merge function
will be turned off, even if it is enabled in Cube Design.
+
+
+
+### Lookup Table Snapshot {#snapshot}
+
+- `kylin.snapshot.max-mb`: specifies the max size of the snapshot. The default
value is 300(M)
+- `kylin.snapshot.max-cache-entry`: The maximum number of snapshots that can
be stored in the cache. The default value is 500.
+- `kylin.snapshot.ext.shard-mb`: specifies the size of HBase shard. The
default value is 500(M).
+- `kylin.snapshot.ext.local.cache.path`: specifies local cache path, default
value is lookup_cache
+- `kylin.snapshot.ext.local.cache.max-size-gb`: specifies local snapshot cache
size, default is 200(M)
+
+
+
+### Build Cube {#cube-build}
+
+- `kylin.storage.default`: specifies the default build engine. The default
value is 2, which means HBase.
+- `kylin.source.hive.keep-flat-table`: whether to keep the Hive intermediate
table after the build job is complete. The default value is *FALSE*
+- `kylin.source.hive.database-for-flat-table`: specifies the name of the Hive
database that stores the Hive intermediate table. The default is *default*.
Make sure that the user who started the Kylin instance has permission to
operate the database.
+- `kylin.source.hive.flat-table-storage-format`: specifies the storage format
of the Hive intermediate table. The default value is *SEQUENCEFILE*
+- `kylin.source.hive.flat-table-field-delimiter`: specifies the delimiter of
the Hive intermediate table. The default value is *\u001F*
+- `kylin.source.hive.redistribute-flat-table`: whether to redistribute the
Hive flat table. The default value is *TRUE*
+- `kylin.source.hive.redistribute-column-count`: number of redistributed
columns. The default value is *3*
+- `kylin.source.hive.table-dir-create-first`: the default value is *FALSE*
+- `kylin.storage.partition.aggr-spill-enabled`: the default value is *TRUE*
+- `kylin.engine.mr.lib-dir`: specifies the path to the jar package used by the
MapReduce job
+- `kylin.engine.mr.reduce-input-mb`: used to estimate the number of Reducers.
The default value is 500(MB).
+- `kylin.engine.mr.reduce-count-ratio`: used to estimate the number of
Reducers. The default value is 1.0
+- `kylin.engine.mr.min-reducer-number`: specifies the minimum number of
Reducers in the MapReduce job. The default is 1
+- `kylin.engine.mr.max-reducer-number`: specifies the maximum number of
Reducers in the MapReduce job. The default is 500.
+- `kylin.engine.mr.mapper-input-rows`: specifies the number of rows that each
Mapper can handle. The default value is 1000000. If user change this value, it
will start more Mapper.
+- `kylin.engine.mr.max-cuboid-stats-calculator-number`: specifies the number
of threads used to calculate Cube statistics. The default value is 1
+- `kylin.engine.mr.build-dict-in-reducer`: whether to build the dictionary in
the Reduce phase of the build job *Extract Fact Table Distinct Columns*. The
default value is `TRUE`
+- `kylin.engine.mr.yarn-check-interval-seconds`: How often the build engine is
checked for the status of the Hadoop job. The default value is 10(s)
+
+
+
+### Dictionary-related {#dict-config}
+
+- `kylin.dictionary.use-forest-trie`: The default value is TRUE
+- `kylin.dictionary.forest-trie-max-mb`: The default value is 500
+- `kylin.dictionary.max-cache-entry`: The default value is 3000
+- `kylin.dictionary.growing-enabled`: The default value is FALSE
+- `kylin.dictionary.append-entry-size`: The default value is 10000000
+- `kylin.dictionary.append-max-versions`: The default value is 3
+- `kylin.dictionary.append-version-ttl`: The default value is 259200000
+- `kylin.dictionary.resuable`: whether to reuse the dictionary. The default
value is FALSE
+- `kylin.dictionary.shrunken-from-global-enable`: whether to reduce the size
of global dictionary. The default value is *FALSE*
+
+
+
+### Deal with Ultra-High-Cardinality Columns {#uhc-config}
+
+- `kylin.engine.mr.build-uhc-dict-in-additional-step`: the default value is
*FALSE*, set to *TRUE*
+- `kylin.engine.mr.uhc-reducer-count`: the default value is 1, which can be
set to 5 to allocate 5 Reducers for each super-high base column.
+
+
+
+### Spark as Build Engine {#spark-cubing}
+
+- `kylin.engine.spark-conf.spark.master`: specifies the Spark operation mode.
The default value is *yarn*
+- `kylin.engine.spark-conf.spark.submit.deployMode`: specifies the deployment
mode of Spark on YARN. The default value is *cluster*
+- `kylin.engine.spark-conf.spark.yarn.queue`: specifies the Spark resource
queue. The default value is *default*
+- `kylin.engine.spark-conf.spark.driver.memory`: specifies the Spark Driver
memory The default value is 2G.
+- `kylin.engine.spark-conf.spark.executor.memory`: specifies the Spark
Executor memory. The default value is 4G.
+- `kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead`: specifies the
size of the Spark Executor heap memory. The default value is 1024(MB).
+- `kylin.engine.spark-conf.spark.executor.cores`: specifies the number of
cores available for a single Spark Executor. The default value is 1
+- `kylin.engine.spark-conf.spark.network.timeout`: specifies the Spark network
timeout period, 600
+- `kylin.engine.spark-conf.spark.executor.instances`: specifies the number of
Spark Executors owned by an Application. The default value is 1
+- `kylin.engine.spark-conf.spark.eventLog.enabled`: whether to record the
Spark event. The default value is *TRUE*
+- `kylin.engine.spark-conf.spark.hadoop.dfs.replication`: replication number
of HDFS, default is 2
+-
`kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress`:
whether to compress the output. The default value is *TRUE*
+-
`kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec`:
specifies Output compression, default is
*org.apache.hadoop.io.compress.DefaultCodec*
+- `kylin.engine.spark.rdd-partition-cut-mb`: Kylin uses the size of this
parameter to split the partition. The default value is 10 (MB)
+- `kylin.engine.spark.min-partition`: specifies the minimum number of
partitions. The default value is 1
+- `kylin.engine.spark.max-partition`: specifies maximum number of partitions,
default is 5000
+- `kylin.engine.spark.storage-level`: specifies RDD partition data cache
level, default value is *MEMORY_AND_DISK_SER*
+- `kylin.engine.spark-conf-mergedict.spark.executor.memory`: whether to
request more memory for merging dictionary.The default value is 6G.
+- `kylin.engine.spark-conf-mergedict.spark.memory.fraction`: specifies the
percentage of memory reserved for the system. The default value is 0.2
+
+> Tip: For more information, please refer to [Building Cubes with
Spark](/docs/tutorial/cube_spark.html).
+
+
+
+### Spark Dynamic Allocation {#dynamic-allocation}
+
+- `kylin.engine.spark-conf.spark.shuffle.service.enabled`: whether to enable
shuffle service
+- `kylin.engine.spark-conf.spark.dynamicAllocation.enabled`: whether to enable
Spark Dynamic Allocation
+- `kylin.engine.spark-conf.spark.dynamicAllocation.initialExecutors`:
specifies the initial number of Executors
+- `kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors`: specifies
the minimum number of Executors retained
+- `kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors`: specifies
the maximum number of Executors applied for
+- `kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout`:
specifies the threshold of Executor being removed after being idle. The default
value is 60(s)
+
+> Tip: For more information, please refer to the official documentation:
[Dynamic Resource
Allocation](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation).
+
+
+
+### Job-related {#job-config}
+
+- `kylin.job.log-dir`: the default value is */tmp/kylin/logs*
+- `kylin.job.allow-empty-segment`: whether tolerant data source is empty. The
default value is *TRUE*
+- `kylin.job.max-concurrent-jobs`: specifies maximum build concurrency,
default is 10
+- `kylin.job.retry`: specifies retry times after the job is failed. The
default value is 0
+- `kylin.job.scheduler.priority-considered`: whether to consider the job
priority. The default value is FALSE
+- `kylin.job.scheduler.priority-bar-fetch-from-queue`: specifies the time
interval for getting jobs from the priority queue. The default value is 20(s)
+- `kylin.job.scheduler.poll-interval-second`: The time interval for getting
the job from the queue. The default value is 30(s)
+- `kylin.job.error-record-threshold`: specifies the threshold for the job to
throw an error message. The default value is 0
+- `kylin.job.cube-auto-ready-enabled`: whether to enable Cube automatically
after the build is complete. The default value is *TRUE*
+- `kylin.cube.max-building-segments`: specifies the maximum number of building
job for the one Cube. The default value is 10
+
+
+
+### Enable Email Notification {#email-notification}
+
+- `kylin.job.notification-enabled`: whether to notify the email when the job
succeeds or fails. The default value is *FALSE*
+- `kylin.job.notification-mail-enable-starttls`:# whether to enable starttls.
The default value is *FALSE*
+- `kylin.job.notification-mail-host`: specifies the SMTP server address of the
mail
+- `kylin.job.notification-mail-port`: specifies the SMTP server port of the
mail. The default value is 25
+- `kylin.job.notification-mail-username`: specifies the login user name of the
mail
+- `kylin.job.notification-mail-password`: specifies the username and password
of the email
+- `kylin.job.notification-mail-sender`: specifies the email address of the
email
+- `kylin.job.notification-admin-emails`: specifies the administrator's mailbox
for email notifications
+
+
+
+### Enable Cube Planner {#cube-planner}
+
+- `kylin.cube.cubeplanner.enabled`: the default value is *TRUE*
+- `kylin.server.query-metrics2-enabled`: the default value is *TRUE*
+- `kylin.metrics.reporter-query-enabled`: the default value is *TRUE*
+- `kylin.metrics.reporter-job-enabled`: the default value is *TRUE*
+- `kylin.metrics.monitor-enabled`: the default value is *TRUE*
+- `kylin.cube.cubeplanner.enabled`: whether to enable Cube Planner, The
default value is *TRUE*
+- `kylin.cube.cubeplanner.enabled-for-existing-cube`: whether to enable Cube
Planner for the existing Cube. The default value is *TRUE*
+- `kylin.cube.cubeplanner.algorithm-threshold-greedy`: the default value is 8
+- `kylin.cube.cubeplanner.expansion-threshold`: the default value is 15.0
+- `kylin.cube.cubeplanner.recommend-cache-max-size`: the default value is 200
+- `kylin.cube.cubeplanner.mandatory-rollup-threshold`: the default value is
1000
+- `kylin.cube.cubeplanner.algorithm-threshold-genetic`: the default value is 23
+
+> Tip: For more information, please refer to [Using Cube
Planner](/docs/tutorial/use_cube_planner.html).
+
+
+
+### HBase Storage {#hbase-config}
+
+- `kylin.storage.hbase.table-name-prefix`: specifies the prefix of HTable. The
default value is *KYLIN\_*
+- `kylin.storage.hbase.namespace`: specifies the default namespace of HBase
Storage. The default value is *default*
+- `kylin.storage.hbase.coprocessor-local-jar`: specifies jar package related
to HBase coprocessor
+- `kylin.storage.hbase.coprocessor-mem-gb`: specifies the HBase coprocessor
memory. The default value is 3.0(GB).
+- `kylin.storage.hbase.run-local-coprocessor`: whether to run the local HBase
coprocessor. The default value is *FALSE*
+- `kylin.storage.hbase.coprocessor-timeout-seconds`: specifies the timeout
period. The default value is 0
+- `kylin.storage.hbase.region-cut-gb`: specifies the size of a single Region,
default is 5.0
+- `kylin.storage.hbase.min-region-count`: specifies the minimum number of
regions. The default value is 1
+- `kylin.storage.hbase.max-region-count`: specifies the maximum number of
Regions. The default value is 500
+- `kylin.storage.hbase.hfile-size-gb`: specifies the HFile size. The default
value is 2.0(GB)
+- `kylin.storage.hbase.max-scan-result-bytes`: specifies the maximum value of
the scan return. The default value is 5242880 (byte), which is 5 (MB).
+- `kylin.storage.hbase.compression-codec`: whether it is compressed. The
default value is *none*, that is, compression is not enabled
+- `kylin.storage.hbase.rowkey-encoding`: specifies the encoding method of
Rowkey. The default value is *FAST_DIFF*
+- `kylin.storage.hbase.block-size-bytes`: the default value is 1048576
+- `kylin.storage.hbase.small-family-block-size-bytes`: specifies the block
size. The default value is 65536 (byte), which is 64 (KB).
+- `kylin.storage.hbase.owner-tag`: specifies the owner of the Kylin platform.
The default value is [email protected]
+- `kylin.storage.hbase.endpoint-compress-result`: whether to return the
compression result. The default value is TRUE
+- `kylin.storage.hbase.max-hconnection-threads`: specifies the maximum number
of connection threads. The default value is 2048.
+- `kylin.storage.hbase.core-hconnection-threads`: specifies the number of core
connection threads. The default value is 2048.
+- `kylin.storage.hbase.hconnection-threads-alive-seconds`: specifies the
thread lifetime. The default value is 60.
+- `kylin.storage.hbase.replication-scope`: specifies the cluster replication
range. The default value is 0
+- `kylin.storage.hbase.scan-cache-rows`: specifies the number of scan cache
lines. The default value is 1024.
+
+
+
+### Enable Compression {#compress-config}
+
+Kylin does not enable Enable Compression by default. Unsupported compression
algorithms can hinder Kylin's build jobs, but a suitable compression algorithm
can reduce storage overhead and network overhead and improve overall system
operation efficiency.
+Kylin can use three types of compression, HBase table compression, Hive output
compression, and MapReduce job output compression.
+> *Note*: The compression settings will not take effect until the Kylin
instance is restarted.
+
+* HBase table compression
+
+This compression is configured by `kylin.hbase.default.compression.codec` in
`kyiln.properties`. Optional values include `none`, `snappy`, `lzo`, `gzip` and
`lz4`. The default value is `none`, which means no data is compressed.
+> *Note*: Before modifying the compression algorithm, make sure userr HBase
cluster supports the selected compression algorithm.
+
+
+* Hive output compression
+
+This compression is configured by `kylin_hive_conf.xml`. The default
configuration is empty, which means that the default configuration of Hive is
used directly. If user want to override the configuration, add (or replace) the
following properties in `kylin_hive_conf.xml`. Take SNAPPY compression as an
example:
+
+```xml
+<property>
+<name>mapreduce.map.output.compress.codec</name>
+<value>org.apache.hadoop.io.compress.SnappyCodec</value>
+<description></description>
+</property>
+<property>
+<name>mapreduce.output.fileoutputformat.compress.codec</name>
+<value>org.apache.hadoop.io.compress.SnappyCodec</value>
+<description></description>
+</property>
+```
+
+* MapReduce job output compression
+
+This compression is configured via `kylin_job_conf.xml` and
`kylin_job_conf_inmem.xml`. The default is empty, which uses the default
configuration of MapReduce. If user want to override the configuration, add (or
replace) the following properties in `kylin_job_conf.xml` and
`kylin_job_conf_inmem.xml`. Take SNAPPY compression as an example:
+
+```xml
+<property>
+<name>mapreduce.map.output.compress.codec</name>
+<value>org.apache.hadoop.io.compress.SnappyCodec</value>
+<description></description>
+</property>
+<property>
+<name>mapreduce.output.fileoutputformat.compress.codec</name>
+<value>org.apache.hadoop.io.compress.SnappyCodec</value>
+<description></description>
+</property>
+```
+
+
+
+### Query Configuration {$kylin-query}
+
+This section introduces Kylin query related configuration.
+
+
+
+### Query-related {#query-config}
+
+- `kylin.query.skip-empty-segments`: whether to skip empty segments when
querying. The default value is *TRUE*
+- `kylin.query.large-query-threshold`: specifies the maximum number of rows
returned. The default value is 1000000.
+- `kylin.query.security-enabled`: whether to check the ACL when querying. The
default value is *TRUE*
+- `kylin.query.security.table-acl-enabled`: whether to check the ACL of the
corresponding table when querying. The default value is *TRUE*
+- `kylin.query.calcite.extras-props.conformance`: whether to strictly parsed.
The default value is *LENIENT*
+- `kylin.query.calcite.extras-props.caseSensitive`: whether is case sensitive.
The default value is *TRUE*
+- `kylin.query.calcite.extras-props.unquotedCasing`: optional values include
`UNCHANGED`, `TO_UPPER` and `TO_LOWER`. The default value is *TO_UPPER*, that
is, all uppercase
+- `kylin.query.calcite.extras-props.quoting`: whether to add quotes, Optional
values include `DOUBLE_QUOTE`, `BACK_TICK` and `BRACKET`. The default value is
*DOUBLE_QUOTE*
+- `kylin.query.statement-cache-max-num`: specifies the maximum number of
cached PreparedStatements. The default value is 50000
+- `kylin.query.statement-cache-max-num-per-key`: specifies the maximum number
of PreparedStatements per key cache. The default value is 50.
+- `kylin.query.enable-dict-enumerator`: whether to enable the dictionary
enumerator. The default value is *FALSE*
+- `kylin.query.enable-dynamic-column`: whether to enable dynamic columns. The
default value is *FALSE*, set to *TRUE* to query the number of rows in a column
that do not contain NULL
+
+
+
+### Fuzzy Query {#fuzzy}
+
+- `kylin.storage.hbase.max-fuzzykey-scan`: specifies the threshold for the
scanned fuzzy key. If the value is exceeded, the fuzzy key will not be scanned.
The default value is 200.
+- `kylin.storage.hbase.max-fuzzykey-scan-split`: split the large fuzzy key set
to reduce the number of fuzzy keys per scan. The default value is 1
+- `kylin.storage.hbase.max-visit-scanrange`: the default value is 1000000
+
+
+
+### Query Cache {#cache-config}
+
+- `kylin.query.cache-enabled`: whether to enable caching. The default value is
TRUE
+- `kylin.query.cache-threshold-duration`: the query duration exceeding the
threshold is saved in the cache. The default value is 2000 (ms).
+- `kylin.query.cache-threshold-scan-count`: the row count scanned in the query
exceeding the threshold is saved in the cache. The default value is 10240
(rows).
+- `kylin.query.cache-threshold-scan-bytes`: the bytes scanned in the query
exceeding the threshold is saved in the cache. The default value is 1048576
(byte).
+
+
+
+### Query Limits {#query-limit}
+
+- `kylin.query.timeout-seconds`: specifies the query timeout in seconds. The
default value is 0, that is, no timeout limit on query. If the value is less
than 60, it will set to seconds.
+- `kylin.query.timeout-seconds-coefficient`: specifies the coefficient of the
query timeout seconds. The default value is 0.5.
+- `kylin.query.max-scan-bytes`: specifies the maximum bytes scanned by the
query. The default value is 0, that is, there is no limit.
+- `kylin.storage.partition.max-scan-bytes`: specifies the maximum number of
bytes for the query scan. The default value is 3221225472 (bytes), which is 3GB.
+- `kylin.query.max-return-rows`: specifies the maximum number of rows returned
by the query. The default value is 5000000.
+
+
+
+### Query Pushdown {#query-pushdown}
+
+-
`kylin.query.pushdown.runner-class-name=org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl`:
whether to enable query pushdown
+- `kylin.query.pushdown.jdbc.url`: specifies JDBC URL
+- `kylin.query.pushdown.jdbc.driver`: specifies JDBC driver class name. The
default value is *org.apache.hive.jdbc.HiveDriver*
+- `kylin.query.pushdown.jdbc.username`: specifies the Username of the JDBC
database. The default value is *hive*
+- `kylin.query.pushdown.jdbc.password`: specifies JDBC password for the
database. The default value is
+- `kylin.query.pushdown.jdbc.pool-max-total`: specifies the maximum number of
connections to the JDBC connection pool. The default value is 8.
+- `kylin.query.pushdown.jdbc.pool-max-idle`: specifies the maximum number of
idle connections for the JDBC connection pool. The default value is 8.
+- `kylin.query.pushdown.jdbc.pool-min-idle`: the default value is 0
+- `kylin.query.pushdown.update-enabled`: specifies whether to enable update in
Query Pushdown. The default value is *FALSE*
+- `kylin.query.pushdown.cache-enabled`: whether to enable the cache of the
pushdown query to improve the query efficiency of the same query. The default
value is *FALSE*
+
+> Tip: For more information, please refer to [Query
Pushdown](/docs/tutorial/query_pushdown.html)
+
+
+
+### Query rewriting {#convert-sql}
+
+- `kylin.query.force-limit`: this parameter achieves the purpose of shortening
the query duration by forcing a LIMIT clause for the select * statement. The
default value is *-1*, and the parameter value is set to a positive integer,
such as 1000, the value will be applied to the LIMIT clause, and the query will
eventually be converted to select * from fact_table limit 1000
+
+
+
+### Collect Query Metrics to JMX {#jmx-metrics}
+
+- `kylin.server.query-metrics-enabled`: the default value is *FALSE*, set to
*TRUE* to collect query metrics to JMX
+
+> Tip: For more information, please refer to
[JMX](https://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html)
+
+
+
+### Collect Query Metrics to dropwizard {#dropwizard-metrics}
+
+- `kylin.server.query-metrics2-enabled`: the default value is *FALSE*, set to
*TRUE* to collect query metrics into dropwizard
+
+> Tip: For more information, please refer to
[dropwizard](https://metrics.dropwizard.io/4.0.0/)
+
+
+
+### Security Configuration {#kylin-security}
+
+This section introduces Kylin security-related configuration.
+
+
+
+### Integrated LDAP for SSO {#ldap-sso}
+
+- `kylin.security.profile=ldap`: whether to enable LDAP
+- `kylin.security.ldap.connection-server`: specifies LDAP server, such as
*ldap://ldap_server:389*
+- `kylin.security.ldap.connection-username`: specifies LDAP username
+- `kylin.security.ldap.connection-password`: specifies LDAP password
+- `kylin.security.ldap.user-search-base`: specifies the scope of users synced
to Kylin
+- `kylin.security.ldap.user-search-pattern`: specifies the username for the
login verification match
+- `kylin.security.ldap.user-group-search-base`: specifies the scope of the
user group synchronized to Kylin
+- `kylin.security.ldap.user-group-search-filter`: specifies the type of user
synced to Kylin
+- `kylin.security.ldap.service-search-base`: need to be specifies when a
service account is required to access Kylin
+- `kylin.security.ldap.service-search-pattern`: need to be specifies when a
service account is required to access Kylin
+- `kylin.security.ldap.service-group-search-base`: need to be specifies when a
service account is required to access Kylin
+- `kylin.security.acl.admin-role`: map an LDAP group to an admin role (group
name case sensitive)
+- `kylin.server.auth-user-cache.expire-seconds`: specifies LDAP user
information cache time, default is 300(s)
+- `kylin.server.auth-user-cache.max-entries`: specifies maximum number of LDAP
users, default is 100
+
+
+
+### Integrate with Apache Ranger {#ranger}
+
+-
`kylin.server.external-acl-provider=org.apache.ranger.authorization.kylin.authorizer.RangerKylinAuthorizer`
+
+> Tip: For more information, please refer to [How to integrate the Kylin
plugin in the installation documentation for
Ranger](https://cwiki.apache.org/confluence/display/RANGER/Kylin+Plugin)
+
+
+
+### Enable ZooKeeper ACL {#zookeeper-acl}
+
+- `kylin.env.zookeeper-acl-enabled`: Enable ZooKeeper ACL to prevent
unauthorized users from accessing the Znode or reducing the risk of bad
operations resulting from this. The default value is *FALSE*
+- `kylin.env.zookeeper.zk-auth`: use username: password as the ACL identifier.
The default value is *digest:ADMIN:KYLIN*
+- `kylin.env.zookeeper.zk-acl`: Use a single ID as the ACL identifier. The
default value is *world:anyone:rwcda*, *anyone* for anyone
\ No newline at end of file
diff --git a/website/_docs/install/hadoop_evn.md
b/website/_docs/install/hadoop_evn.md
deleted file mode 100644
index dcf288a824..0000000000
--- a/website/_docs/install/hadoop_evn.md
+++ /dev/null
@@ -1,24 +0,0 @@
----
-layout: docs
-title: "Hadoop Environment"
-categories: install
-permalink: /docs/install/hadoop_env.html
----
-
-Kylin need run in a Hadoop node, to get better stability, we suggest you to
deploy it a pure Hadoop client machine, on which the command lines like
`hive`, `hbase`, `hadoop`, `hdfs` already be installed and configured. The
Linux account that running Kylin has got permission to the Hadoop cluster,
including create/write hdfs, hive tables, hbase tables and submit MR jobs.
-
-## Software dependencies
-
-* Hadoop: 2.7+
-* Hive: 0.13 - 1.2.1+
-* HBase: 1.1+
-* Spark 2.1
-* JDK: 1.7+
-* OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
-
-Tested with Hortonworks HDP 2.2 - 2.6, Cloudera CDH 5.7 - 5.11, AWS EMR 5.7 -
5.10, Azure HDInsight 3.5 - 3.6.
-
-For trial and development purpose, we recommend you try Kylin with an
all-in-one sandbox VM, like [HDP
sandbox](http://hortonworks.com/products/hortonworks-sandbox/), and give it 10
GB memory. To avoid permission issue in the sandbox, you can use its `root`
account. We also suggest you using bridged mode instead of NAT mode in Virtual
Box settings. Bridged mode will assign your sandbox an independent IP address
so that you can avoid issues like
[this](https://github.com/KylinOLAP/Kylin/issues/12).
-
-
-
diff --git a/website/_docs/install/index.cn.md
b/website/_docs/install/index.cn.md
index ef920a77de..d762f9cfb5 100644
--- a/website/_docs/install/index.cn.md
+++ b/website/_docs/install/index.cn.md
@@ -15,15 +15,18 @@ permalink: /cn/docs/install/index.html
* JDK: 1.8+ (since v2.5)
* OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
-在 Hortonworks HDP 2.2 - 2.6 and 3.0, Cloudera CDH 5.7 - 5.11 and 6.0, AWS EMR
5.7 - 5.10, Azure HDInsight 3.5 - 3.6 上测试通过。
+在 Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, AWS EMR
5.7-5.10, Azure HDInsight 3.5-3.6 上测试通过。
我们建议您使用集成的 sandbox 来试用 Kylin 或进行开发,比如 [HDP
sandbox](http://hortonworks.com/products/hortonworks-sandbox/),且要保证其有至少 10 GB
内存。在配置沙箱时,我们推荐您使用 Bridged Adapter 模型替代 NAT 模型。
+
+
### 硬件要求
运行 Kylin 的服务器的最低配置为 4 core CPU,16 GB 内存和 100 GB 磁盘。 对于高负载的场景,建议使用 24 core
CPU,64 GB 内存或更高的配置。
+
### Hadoop 环境
Kylin 依赖于 Hadoop 集群处理大量的数据集。您需要准备一个配置好 HDFS,YARN,MapReduce,,Hive,
HBase,Zookeeper 和其他服务的 Hadoop 集群供 Kylin 运行。
@@ -31,6 +34,8 @@ Kylin 可以在 Hadoop 集群的任意节点上启动。方便起见,您可以
运行 Kylin 的 Linux 账户要有访问 Hadoop 集群的权限,包括创建/写入 HDFS 文件夹,Hive 表, HBase 表和提交
MapReduce 任务的权限。
+
+
### Kylin 安装
1. 从 [Apache Kylin下载网站](https://kylin.apache.org/download/) 下载一个适用于您 Hadoop
版本的二进制文件。例如,适用于 HBase 1.x 的 Kylin 2.5.0 可通过如下命令行下载得到:
@@ -49,11 +54,13 @@ export KYLIN_HOME=`pwd`
```
+
### 检查运行环境
Kylin 运行在 Hadoop 集群上,对各个组件的版本、访问权限及 CLASSPATH 等都有一定的要求,为了避免遇到各种环境问题,您可以运行
`$KYLIN_HOME/bin/check-env.sh`
脚本来进行环境检测,如果您的环境存在任何的问题,脚本将打印出详细报错信息。如果没有报错信息,代表您的环境适合 Kylin 运行。
+
### 启动 Kylin
运行 `$KYLIN_HOME/bin/kylin.sh start` 脚本来启动 Kylin,界面输出如下:
@@ -68,6 +75,7 @@ Web UI is at http://<hostname>:7070/kylin
```
+
### 使用 Kylin
Kylin 启动后您可以通过浏览器 `http://<hostname>:7070/kylin` 进行访问。
@@ -76,6 +84,7 @@ Kylin 启动后您可以通过浏览器 `http://<hostname>:7070/kylin` 进行访
服务器启动后,您可以通过查看 `$KYLIN_HOME/logs/kylin.log` 获得运行时日志。
+
### 停止 Kylin
运行 `$KYLIN_HOME/bin/kylin.sh stop` 脚本来停止 Kylin,界面输出如下:
diff --git a/website/_docs/install/index.md b/website/_docs/install/index.md
index a20ad1dfb0..93a853794a 100644
--- a/website/_docs/install/index.md
+++ b/website/_docs/install/index.md
@@ -5,74 +5,96 @@ categories: install
permalink: /docs/install/index.html
---
-## Software requirements
+### Software Requirements
* Hadoop: 2.7+, 3.1+ (since v2.5)
* Hive: 0.13 - 1.2.1+
* HBase: 1.1+, 2.0 (since v2.5)
-* Spark 2.1.1+
+* Spark (optional) 2.1.1+
+* Kafka (optional) 0.10.0+
* JDK: 1.8+ (since v2.5)
* OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
-Tested with Hortonworks HDP 2.4 - 2.6 and 3.0, Cloudera CDH 5.7 - 5.11 and
6.0, AWS EMR 5.7 - 5.10, Azure HDInsight 3.5 - 3.6.
+Tests passed on Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and
6.0, AWS EMR 5.7-5.10, Azure HDInsight 3.5-3.6.
-For trial and development purpose, we recommend you try Kylin with an
all-in-one sandbox VM, like [Hortonworks HDP
sandbox](http://hortonworks.com/products/hortonworks-sandbox/), and give it 10
GB memory. We suggest you using bridged mode instead of NAT mode in Virtual Box
settings.
+We recommend you to try out Kylin or develop it using the integrated sandbox,
such as [HDP sandbox](http://hortonworks.com/products/hortonworks-sandbox/),
and make sure it has at least 10 GB of memory. When configuring a sandbox, we
recommend that you use the Bridged Adapter model instead of the NAT model.
-## Hardware requirements
-The server to run Kylin need 4 core CPU, 16 GB memory and 100 GB disk as the
minimal configuration. For high workload scenario, 24 core CPU, 64 GB memory or
more is recommended.
+### Hardware Requirements
-## Hadoop Environment
+The minimum configuration of a server running Kylin is 4 core CPU, 16 GB RAM
and 100 GB disk. For high-load scenarios, a 24-core CPU, 64 GB RAM or higher is
recommended.
-Kylin depends on Hadoop cluster to process the massive data set. You need
prepare a well configured Hadoop cluster for Kylin to run, with the common
services includes HDFS, YARN, MapReduce, Hive, HBase, Zookeeper and other
services. It is most common to install Kylin on a Hadoop client machine, from
which Kylin can talk with the Hadoop cluster via command lines including
`hive`, `hbase`, `hadoop`, etc.
-Kylin itself can be started in any node of the Hadoop cluster. For simplity,
you can run it in the master node. But to get better stability, we suggest you
to deploy it a pure Hadoop client node, on which the command lines like `hive`,
`hbase`, `hadoop`, `hdfs` already be installed and the client congfigurations
(core-site.xml, hive-site.xml, hbase-site.xml, etc) are properly configured and
will be automatically synced with other nodes. The Linux account that running
Kylin has the permission to access the Hadoop cluster, including create/write
HDFS folders, hive tables, hbase tables and submit MR jobs.
-## Installation Kylin
+### Hadoop Environment
- * Download a version of Kylin binaries for your Hadoop version from a closer
Apache download site. For example, Kylin 2.3.1 for HBase 1.x from US:
-{% highlight Groff markup %}
-cd /usr/local
-wget
http://www-us.apache.org/dist/kylin/apache-kylin-2.3.1/apache-kylin-2.3.1-hbase1x-bin.tar.gz
-{% endhighlight %}
- * Uncompress the tarball and then export KYLIN_HOME pointing to the Kylin
folder
-{% highlight Groff markup %}
-tar -zxvf apache-kylin-2.3.1-hbase1x-bin.tar.gz
-cd apache-kylin-2.3.1-bin
-export KYLIN_HOME=`pwd`
-{% endhighlight %}
- * Make sure the user has the privilege to run hadoop, hive and hbase cmd in
shell. If you are not so sure, you can run `$KYLIN_HOME/bin/check-env.sh`, it
will print out the detail information if you have some environment issues. If
no error, that means the environment is ready.
-{% highlight Groff markup %}
--bash-4.1# $KYLIN_HOME/bin/check-env.sh
-Retrieving hadoop conf dir...
-KYLIN_HOME is set to /usr/local/apache-kylin-2.3.1-bin
--bash-4.1#
-{% endhighlight %}
- * Start Kylin, run `$KYLIN_HOME/bin/kylin.sh start`, after the server starts,
you can watch `$KYLIN_HOME/logs/kylin.log` for runtime logs;
-{% highlight Groff markup %}
--bash-4.1# $KYLIN_HOME/bin/kylin.sh start
-Retrieving hadoop conf dir...
-KYLIN_HOME is set to /usr/local/apache-kylin-2.3.1-bin
-Retrieving hive dependency...
-Retrieving hbase dependency...
+Kylin relies on Hadoop clusters to handle large data sets. You need to prepare
a Hadoop cluster with HDFS, YARN, MapReduce, Hive, HBase, Zookeeper and other
services for Kylin to run.
+Kylin can be launched on any node in a Hadoop cluster. For convenience, you
can run Kylin on the master node. For better stability, it is recommended to
deploy Kylin on a clean Hadoop client node with Hive, HBase, HDFS and other
command lines installed and client configuration (such as `core-site.xml`,
`hive-site.xml`, `hbase-site.xml` and others) are also reasonably configured
and can be automatically synchronized with other nodes.
+
+Linux accounts running Kylin must have access to the Hadoop cluster, including
the permission to create/write HDFS folders, Hive tables, HBase tables, and
submit MapReduce tasks.
+
+
+
+### Kylin Installation
+
+1. Download a binary package for your Hadoop version from the [Apache Kylin
Download Site](https://kylin.apache.org/download/). For example, Kylin 2.5.0
for HBase 1.x can be downloaded from the following command line:
+
+```shell
+Cd /usr/local/
+Wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
+```
+
+2. Unzip the tarball and configure the environment variable `$KYLIN_HOME` to
the Kylin folder.
+
+```shell
+Tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
+Cd apache-kylin-2.5.0-bin-hbase1x
+Export KYLIN_HOME=`pwd`
+```
+
+
+
+### Checking the operating environment
+
+Kylin runs on a Hadoop cluster and has certain requirements for the version,
access rights, and CLASSPATH of each component. To avoid various environmental
problems, you can run the script, `$KYLIN_HOME/bin/check-env.sh` to have a test
on your environment, if there are any problems with your environment, the
script will print a detailed error message. If there is no error message, it
means that your environment is suitable for Kylin to run.
+
+
+
+### Start Kylin
+
+Run the script, `$KYLIN_HOME/bin/kylin.sh start` , to start Kylin. The
interface output is as follows:
+
+```
Retrieving hadoop conf dir...
-Retrieving kafka dependency...
-Retrieving Spark dependency...
-...
+KYLIN_HOME is set to /usr/local/apache-kylin-2.5.0-bin-hbase1x
+......
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
-Check the log at /usr/local/apache-kylin-2.3.1-bin/logs/kylin.log
+Check the log at /usr/local/apache-kylin-2.5.0-bin-hbase1x/logs/kylin.log
Web UI is at http://<hostname>:7070/kylin
--bash-4.1#
-{% endhighlight %}
- * After Kylin started you can visit <http://hostname:7070/kylin> in your web
browser. The initial username/password is ADMIN/KYLIN.
- * To stop Kylin, run `$KYLIN_HOME/bin/kylin.sh stop`
-{% highlight Groff markup %}
--bash-4.1# $KYLIN_HOME/bin/kylin.sh stop
-Retrieving hadoop conf dir...
-KYLIN_HOME is set to /usr/local/apache-kylin-2.3.1-bin
-Stopping Kylin: 7014
-Kylin with pid 7014 has been stopped.
-{% endhighlight %}
+```
+
+
+### Using Kylin
+
+Once Kylin is launched, you can access it via the browser
`http://<hostname>:7070/kylin` with
+specifying `<hostname>` with IP address or domain name, and the default port
is 7070.
+The initial username and password are `ADMIN/KYLIN`.
+After the server is started, you can view the runtime log,
`$KYLIN_HOME/logs/kylin.log`.
+
+
+
+### Stop Kylin
+
+Run the `$KYLIN_HOME/bin/kylin.sh stop` script to stop Kylin. The console
output is as follows:
+
+```
+Retrieving hadoop conf dir...
+KYLIN_HOME is set to /usr/local/apache-kylin-2.5.0-bin-hbase1x
+Stopping Kylin: 25964
+Stopping in progress. Will check after 2 secs again...
+Kylin with pid 25964 has been stopped.
+```
+You can run `ps -ef | grep kylin` to see if the Kylin process has stopped.
\ No newline at end of file
diff --git a/website/_docs/install/kylin_aws_emr.cn.md
b/website/_docs/install/kylin_aws_emr.cn.md
index 99a19250b5..b0173d8d55 100644
--- a/website/_docs/install/kylin_aws_emr.cn.md
+++ b/website/_docs/install/kylin_aws_emr.cn.md
@@ -5,19 +5,23 @@ categories: install
permalink: /cn/docs/install/kylin_aws_emr.html
---
-今天许多用户将 Hadoop 运行在像 AWS 这样的公有云上。Apache Kylin,由标准的 Hadoop/HBase API 编译,支持多数主流的
Hadoop 发布;现在的版本是 Kylin v2.2,支持 AWS EMR 5.0 - 5.10。本文档介绍了在 EMR 上如何运行 Kylin。
+本文档介绍了在 EMR 上如何运行 Kylin。
+
+
### 推荐版本
* AWS EMR 5.7 (EMR 5.8 及以上,请查看
[KYLIN-3129](https://issues.apache.org/jira/browse/KYLIN-3129))
* Apache Kylin v2.2.0 or above for HBase 1.x
+
+
### 启动 EMR 集群
-使用 AWS 网页控制台,命令行或 API 运行一个 EMR 集群。在 Kylin 需要 HBase 服务的应用中选择 "**HBase**"。
+使用 AWS 网页控制台,命令行或 API 运行一个 EMR 集群。在 Kylin 需要 HBase 服务的应用中选择 **HBase**。
-您可以选择 "HDFS" 或者 "S3" 作为 HBase 的存储,这取决于您在关闭集群之后是否需要将 Cube 数据进行存储。EMR HDFS 使用
EC2 实例的本地磁盘,当集群停止后数据将被清除,Kylin metadata 和 Cube 数据将会丢失。
+您可以选择 HDFS 或者 S3 作为 HBase 的存储,这取决于您在关闭集群之后是否需要将 Cube 数据进行存储。EMR HDFS 使用 EC2
实例的本地磁盘,当集群停止后数据将被清除,Kylin 元数据和 Cube 数据将会丢失。
-如果您使用 "S3" 作为 HBase 的存储,您需要自定义配置为 "**hbase.rpc.timeout**",由于 S3
的大容量负载是一个复制操作,当数据规模比较大时,HBase region 服务器比在 HDFS 上将花费更多的时间等待其完成。
+如果您使用 S3 作为 HBase 的存储,您需要自定义配置为 `hbase.rpc.timeout`,由于 S3
的大容量负载是一个复制操作,当数据规模比较大时,HBase Region 服务器比在 HDFS 上将花费更多的时间等待其完成。
```
[ {
@@ -36,55 +40,58 @@ permalink: /cn/docs/install/kylin_aws_emr.html
]
```
+
+
### 安装 Kylin
当 EMR 集群处于 "Waiting" 状态,您可以 SSH 到 master 节点,下载 Kylin 然后解压 tar 包:
-```
+```sh
sudo mkdir /usr/local/kylin
sudo chown hadoop /usr/local/kylin
cd /usr/local/kylin
-wget
http://www-us.apache.org/dist/kylin/apache-kylin-2.2.0/apache-kylin-2.2.0-bin-hbase1x.tar.gz
-tar –zxvf apache-kylin-2.2.0-bin-hbase1x.tar.gz
+wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
+tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
```
### 配置 Kylin
启动 Kylin 前,您需要进行一组配置:
-- 从 /etc/hbase/conf/hbase-site.xml 复制 "hbase.zookeeper.quorum" 属性到
$KYLIN\_HOME/conf/kylin\_job\_conf.xml,例如:
+- 从 `/etc/hbase/conf/hbase-site.xml` 复制 `hbase.zookeeper.quorum` 属性到
`$KYLIN_HOME/conf/kylin_job_conf.xml`,例如:
-```
+```xml
<property>
<name>hbase.zookeeper.quorum</name>
<value>ip-nn-nn-nn-nn.ap-northeast-2.compute.internal</value>
</property>
```
-- 使用 HDFS 作为 "kylin.env.hdfs-working-dir" (推荐)
+- 使用 HDFS 作为 `kylin.env.hdfs-working-dir` (推荐)
-EMR 建议 **"当集群运行时使用 HDFS 作为中间数据的存储而 Amazon S3 只用来输入初始的数据和输出的最终结果"**。Kylin 的
'hdfs-working-dir' 用来存放 Cube building 时的中间数据,cuboid 文件和一些 metadata 文件 (例如在
Hbase 中不好的 dictionary 和 table snapshots);因此最好为其配置 HDFS。
+EMR 建议 **当集群运行时使用 HDFS 作为中间数据的存储而 Amazon S3 只用来输入初始的数据和输出的最终结果**。Kylin 的
`hdfs-working-dir` 用来存放 Cube 构建时的中间数据,cuboid 数据文件和一些元数据文件 (例如不便于在 HBase 中存储的
`/dictionary` 和 `/table snapshots`),因此最好为其配置 HDFS。
如果使用 HDFS 作为 Kylin 的工作目录,您无需做任何修改,因为 EMR 的默认文件系统是 HDFS:
-```
+```properties
kylin.env.hdfs-working-dir=/kylin
```
-关闭/重启集群前,您必须用
[S3DistCp](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html)
备份 HDFS 上 "/kylin" 路径下的数据到 S3,否则您可能丢失数据且之后不能恢复集群。
+关闭/重启集群前,您必须用
[S3DistCp](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html)
备份 HDFS 上 `/kylin` 路径下的数据到 S3,否则您可能丢失数据且之后不能恢复集群。
-- 使用 S3 作为 "kylin.env.hdfs-working-dir"
+- 使用 S3 作为 `kylin.env.hdfs-working-dir`
如果您想使用 S3 作为存储 (假设 HBase 也在 S3 上),您需要配置下列参数:
-```
+```properties
kylin.env.hdfs-working-dir=s3://yourbucket/kylin
kylin.storage.hbase.cluster-fs=s3://yourbucket
kylin.source.hive.redistribute-flat-table=false
```
-中间文件和 HFile 也都会写入 S3。Build 性能将会比 HDFS 慢。确保您很好的理解了 S3 和 HDFS 的区别。阅读下列来自 AWS 的文章:
+中间文件和 HFile 也都会写入 S3。构建性能将会比 HDFS 慢。
+为了很好地理解 S3 和 HDFS 的区别,请参考如下来自 AWS 的两篇文章:
[Input and Output
Errors](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html)
[Are you having trouble loading data to or from Amazon S3 into
Hive](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-error-hive.html#emr-troubleshoot-error-hive-3)
@@ -94,7 +101,7 @@ kylin.source.hive.redistribute-flat-table=false
根据
[emr-troubleshoot-errors-io](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html),为在
S3 上获得更好的性能和数据一致性需要应用一些 Hadoop 配置。
-```
+```xml
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
@@ -121,13 +128,13 @@ kylin.source.hive.redistribute-flat-table=false
- 如果不存在创建工作目录文件夹
-```
+```sh
hadoop fs -mkdir /kylin
```
或
-```
+```sh
hadoop fs -mkdir s3://yourbucket/kylin
```
@@ -135,21 +142,23 @@ hadoop fs -mkdir s3://yourbucket/kylin
启动和在普通 Hadoop 上一样:
-```
+```sh
export KYLIN_HOME=/usr/local/kylin/apache-kylin-2.2.0-bin
$KYLIN_HOME/bin/sample.sh
$KYLIN_HOME/bin/kylin.sh start
```
-别忘记在 EMR master - "ElasticMapReduce-master" 的安全组中启用 7070 端口访问,或使用 SSH 连接
master 节点,然后您可以使用 http://\<master\-dns\>:7070/kylin 访问 Kylin Web GUI。
+别忘记在 EMR master - "ElasticMapReduce-master" 的安全组中启用 7070 端口访问,或使用 SSH 连接
master 节点,然后您可以使用 `http://<master-dns>:7070/kylin` 访问 Kylin Web GUI。
Build 同一个 Cube,当 Cube 准备好后运行查询。您可以浏览 S3 查看数据是否安全的持久化了。
+
+
### Spark 配置
EMR 的 Spark 版本很可能与 Kylin 编译的版本不一致,因此您通常不能直接使用 EMR 打包的 Spark 用于 Kylin 的任务。
您需要在启动 Kylin 之前,将 "SPARK_HOME" 环境变量设置指向 Kylin 的 Spark 子目录 (KYLIN_HOME/spark)
。此外,为了从 Spark 中访问 S3 或 EMRFS 上的文件,您需要将 EMR 的扩展类从 EMR 的目录拷贝到 Kylin 的 Spark 下。
-```
+```sh
export SPARK_HOME=$KYLIN_HOME/spark
cp /usr/lib/hadoop-lzo/lib/*.jar $KYLIN_HOME/spark/jars/
@@ -161,17 +170,19 @@ $KYLIN_HOME/bin/kylin.sh start
您也可以参考 EMR Spark 的 spark-defauts 来设置 Kylin 的 Spark 配置,以获得更好的对集群资源的适配。
+
+
### 关闭 EMR 集群
关闭 EMR 集群前,我们建议您为 Kylin metadata 做备份且将其上传到 S3。
为了在关闭 Amazon EMR 集群时不丢失没写入 Amazon S3 的数据,MemStore cache 需要刷新到 Amazon S3 写入新的
store 文件。您可以运行 EMR 集群上提供的 shell 脚本来完成这个需求。
-```
+```sh
bash /usr/lib/hbase/bin/disable_all_tables.sh
```
-为了用同样的 Hbase 数据重启一个集群,可在 AWS Management Console 中指定和之前集群相同的 Amazon S3 位置或使用
"hbase.rootdir" 配置属性。更多的 EMR HBase 信息,参考 [HBase on Amazon
S3](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html)
+为了用同样的 Hbase 数据重启一个集群,可在 AWS Management Console 中指定和之前集群相同的 Amazon S3 位置或使用
`hbase.rootdir` 配置属性。更多的 EMR HBase 信息,参考 [HBase on Amazon
S3](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html)
## 在专用的 EC2 上部署 Kylin
diff --git a/website/_docs/install/kylin_aws_emr.md
b/website/_docs/install/kylin_aws_emr.md
index fef338427a..230e944689 100644
--- a/website/_docs/install/kylin_aws_emr.md
+++ b/website/_docs/install/kylin_aws_emr.md
@@ -5,19 +5,24 @@ categories: install
permalink: /docs/install/kylin_aws_emr.html
---
-Many users run Hadoop on public Cloud like AWS today. Apache Kylin, compiled
with standard Hadoop/HBase API, support most main stream Hadoop releases; The
current version Kylin v2.2, supports AWS EMR 5.0 to 5.10. This document
introduces how to run Kylin on EMR.
+This document introduces how to run Kylin on EMR.
+
+
### Recommended Version
-* AWS EMR 5.7 (for EMR 5.8 and above, please check
[KYLIN-3129](https://issues.apache.org/jira/browse/KYLIN-3129))
+
+* AWS EMR 5.7 (for EMR 5.8 and above, please refer to
[KYLIN-3129](https://issues.apache.org/jira/browse/KYLIN-3129))
* Apache Kylin v2.2.0 or above for HBase 1.x
+
+
### Start EMR cluster
-Launch an EMR cluser with AWS web console, command line or API. Select
"**HBase**" in the applications as Kylin need HBase service.
+Launch an EMR cluser with AWS web console, command line or API. Select *HBase*
in the applications as Kylin need HBase service.
You can select "HDFS" or "S3" as the storage for HBase, depending on whether
you need Cube data be persisted after shutting down the cluster. EMR HDFS uses
the local disk of EC2 instances, which will erase the data when cluster is
stopped, then Kylin metadata and Cube data can be lost.
-If you use "S3" as HBase's storage, you need customize its configuration for
"**hbase.rpc.timeout**", because the bulk load to S3 is a copy operation, when
data size is huge, HBase region server need wait much longer to finish than on
HDFS.
+If you use S3 as HBase's storage, you need customize its configuration for
`hbase.rpc.timeout`, because the bulk load to S3 is a copy operation, when data
size is huge, HBase region server need wait much longer to finish than on HDFS.
```
[ {
@@ -36,49 +41,53 @@ If you use "S3" as HBase's storage, you need customize its
configuration for "**
]
```
+
+
### Install Kylin
When EMR cluser is in "Waiting" status, you can SSH into its master node,
download Kylin and then uncompress the tar ball:
-```
+```sh
sudo mkdir /usr/local/kylin
sudo chown hadoop /usr/local/kylin
cd /usr/local/kylin
-wget
http://www-us.apache.org/dist/kylin/apache-kylin-2.2.0/apache-kylin-2.2.0-bin-hbase1x.tar.gz
-tar –zxvf apache-kylin-2.2.0-bin-hbase1x.tar.gz
+wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
+tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
```
+
+
### Configure Kylin
Before start Kylin, you need do a couple of configurations:
-- Copy "hbase.zookeeper.quorum" property from /etc/hbase/conf/hbase-site.xml
to $KYLIN\_HOME/conf/kylin\_job\_conf.xml, like this:
+- Copy `hbase.zookeeper.quorum` property from `/etc/hbase/conf/hbase-site.xml`
to `$KYLIN_HOME/conf/kylin_job_conf.xml`, like this:
-```
+```xml
<property>
<name>hbase.zookeeper.quorum</name>
<value>ip-nn-nn-nn-nn.ap-northeast-2.compute.internal</value>
</property>
```
-- Use HDFS as "kylin.env.hdfs-working-dir" (Recommended)
+- Use HDFS as `kylin.env.hdfs-working-dir` (Recommended)
-EMR recommends to **"use HDFS for intermediate data storage while the cluster
is running and Amazon S3 only to input the initial data and output the final
results"**. Kylin's 'hdfs-working-dir' is for putting the intermediate data for
Cube building, cuboid files and also some metadata files (like dictionary and
table snapshots which are not good in HBase); so it is best to configure HDFS
for this.
+EMR recommends to *use HDFS for intermediate data storage while the cluster is
running and Amazon S3 only to input the initial data and output the final
results*. Kylin's 'hdfs-working-dir' is for putting the intermediate data for
Cube building, cuboid files and also some metadata files (like dictionary and
table snapshots which are not good in HBase); so it is best to configure HDFS
for this.
If using HDFS as Kylin working directory, you just leave configurations
unchanged as EMR's default FS is HDFS:
-```
+```properties
kylin.env.hdfs-working-dir=/kylin
```
Before you shutdown/restart the cluster, you must backup the "/kylin" data on
HDFS to S3 with
[S3DistCp](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html),
or you may lost data and couldn't recover the cluster later.
-- Use S3 as "kylin.env.hdfs-working-dir"
+- Use S3 as `kylin.env.hdfs-working-dir`
If you want to use S3 as storage (assume HBase is also on S3), you need
configure the following parameters:
-```
+```properties
kylin.env.hdfs-working-dir=s3://yourbucket/kylin
kylin.storage.hbase.cluster-fs=s3://yourbucket
kylin.source.hive.redistribute-flat-table=false
@@ -94,7 +103,7 @@ The intermediate file and the HFile will all be written to
S3. The build perform
Some Hadoop configurations need be applied for better performance and data
consistency on S3, according to
[emr-troubleshoot-errors-io](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html)
-```
+```xml
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
@@ -145,11 +154,13 @@ Don't forget to enable the 7070 port access in the
security group for EMR master
Build the sample Cube, and then run queries when the Cube is ready. You can
browse S3 to see whether the data is safely persisted.
+
+
### Spark Configuration
EMR's Spark version may be incompatible with Kylin, so you couldn't directly
use EMR's Spark. You need to set "SPARK_HOME" environment variable to Kylin's
Spark folder (KYLIN_HOME/spark) before start Kylin. To access files on S3 or
EMRFS, we need to copy EMR's implementation jars to Spark.
-```
+```sh
export SPARK_HOME=$KYLIN_HOME/spark
cp /usr/lib/hadoop-lzo/lib/*.jar $KYLIN_HOME/spark/jars/
@@ -161,18 +172,21 @@ $KYLIN_HOME/bin/kylin.sh start
You can also copy EMR's spark-defauts configuration to Kylin's spark for a
better utilization of the cluster resources.
+
+
### Shut down EMR Cluster
Before you shut down EMR cluster, we suggest you take a backup for Kylin
metadata and upload it to S3.
To shut down an Amazon EMR cluster without losing data that hasn't been
written to Amazon S3, the MemStore cache needs to flush to Amazon S3 to write
new store files. To do this, you can run a shell script provided on the EMR
cluster.
-```
+```sh
bash /usr/lib/hbase/bin/disable_all_tables.sh
```
To restart a cluster with the same HBase data, specify the same Amazon S3
location as the previous cluster either in the AWS Management Console or using
the "hbase.rootdir" configuration property. For more information about EMR
HBase, refer to [HBase on Amazon
S3](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html)
+
## Deploy Kylin in a dedicated EC2
diff --git a/website/_docs/install/kylin_cluster.cn.md
b/website/_docs/install/kylin_cluster.cn.md
index a09aabb154..488925186b 100644
--- a/website/_docs/install/kylin_cluster.cn.md
+++ b/website/_docs/install/kylin_cluster.cn.md
@@ -16,7 +16,7 @@ Kylin 实例是无状态的服务,运行时的状态信息存储在 HBase meta
1. 配置相同的 `kylin.metadata.url` 值,即配置所有的 Kylin 节点使用同一个 HBase metastore。
2. 配置 Kylin 节点列表
`kylin.server.cluster-servers`,包括所有节点(包括当前节点),当事件变化时,接收变化的节点需要通知其他所有节点(包括当前节点)。
-3. 配置 Kylin 节点的运行模式 `kylin.server.mode`,参数值可选 `all`, `job`, `query` 中的一个,默认为
`all`。
+3. 配置 Kylin 节点的运行模式 `kylin.server.mode`,参数值可选 `all`, `job`, `query` 中的一个,默认值为
`all`。
`job` 模式代表该服务仅用于任务调度,不用于查询;`query` 模式代表该服务仅用于查询,不用于构建任务的调度;`all`
模式代表该服务同时用于任务调度和 SQL 查询。
> **注意:**默认情况下只有**一个实例**用于构建任务的调度 (即 `kylin.server.mode` 设置为 `all` 或者 `job`
> 模式),如果您需要配置多个节点进行任务构建,以满足高可用和高并发的需求,请参考 [Kylin
> 设置](/docs/install/configuration.html) 页中的**任务引擎高可用**的内容。
diff --git a/website/_docs/install/kylin_cluster.md
b/website/_docs/install/kylin_cluster.md
index 65ee0fe6e4..617e09be68 100644
--- a/website/_docs/install/kylin_cluster.md
+++ b/website/_docs/install/kylin_cluster.md
@@ -6,52 +6,34 @@ permalink: /docs/install/kylin_cluster.html
---
-### Kylin Server modes
+Kylin instances are stateless services, and runtime state information is
stored in the HBase metastore. For load balancing purposes, you can enable
multiple Kylin instances that share a metastore, so that each node shares query
pressure and backs up each other, improving service availability. The following
figure depicts a typical scenario for Kylin cluster mode deployment:
+
-Kylin instances are stateless, the runtime state is saved in its metadata
store in HBase (specified by `kylin.metadata.url` in `conf/kylin.properties`).
For load balance considerations it is recommended to run multiple Kylin
instances sharing the same metadata store, thus they share the same state on
table schemas, job status, Cube status, etc.
-Each of the Kylin instance has a "kylin.server.mode" entry in
`conf/kylin.properties` specifying the runtime mode, it has three options:
- * **job** : run job engine in this instance; Kylin job engine manages the
jobs to cluster;
- * **query** : run query engine only; Kylin query engine accepts and answers
your SQL queries;
- * **all** : run both job engine and query engines in this instance.
+### Kylin Node Configuration
- By default only one instance can run the job engine ("all" or "job" mode),
the others should be in the "query" mode.
+If you need to cluster multiple Kylin nodes, make sure they use the same
Hadoop cluster, HBase cluster. Then do the following steps in each node's
configuration file `$KYLIN_HOME/conf/kylin.properties`:
- If you want to run multiple job engines to get high availability or handle
heavy concurrent jobs, please check "Enable multiple job engines" in [Advanced
settings](advance_settings.html) page.
+1. Configure the same `kylin.metadata.url` value to configure all Kylin nodes
to use the same HBase metastore.
+2. Configure the Kylin node list `kylin.server.cluster-servers`, including all
nodes (the current node is also included). When the event changes, the node
receiving the change needs to notify all other nodes (the current node is also
included).
+3. Configure the running mode `kylin.server.mode` of the Kylin node. Optional
values include `all`, `job`, `query`. The default value is *all*.
+The *job* mode means that the service is only used for task scheduling, not
for queries; the *query* pattern means that the service is only used for
queries, not for scheduling tasks; the *all* pattern represents the service for
both task scheduling and queries.
+> *Note*: By default, only *one instance* is used for the scheduling of the
build job (ie `kylin.server.mode` is set to `all` or `job`), if you need to
configure multiple Nodes as job mode to meet high-availability and
high-concurrency requirements, please refer to the *Job Engine High
Availability* section of the [Kylin Settings](/docs/install/configuration.html)
page.
-A typical scenario is depicted in the following chart:
-
-### Configure Multiple Kylin Servers
+### Installing a load balancer
-If you are running Kylin in a cluster where you have multiple Kylin server
instances, please make sure you have the following property correctly
configured in `conf/kylin.properties` for EVERY instance (both job and query).
+To send query requests to a cluster instead of a single node, you can deploy a
load balancer such as [Nginx](http://nginx.org/en/), [F5](https://www.f5.com/)
or [cloudlb](https://rubygems.org/gems/cloudlb/), etc., so that the client and
load balancer communication instead communicate with a specific Kylin instance.
- * `kylin.rest.servers`
- List of servers in use, this enables one instance to notify other
servers when there is event change. For example:
-```
-kylin.rest.servers=host1:7070,host2:7070
-```
- * `kylin.server.mode`
- By default, only one instance whose `kylin.server.mode` is set to "all"
or "job", the others be "query"
+### Read and write separation deployment
-```
-kylin.server.mode=all
-```
+For better stability and optimal performance, it is recommended to perform a
read-write separation deployment, deploying Kylin on two clusters as follows:
-### Setup Load Balancer
+* A Hadoop cluster used to *Cube build*, which can be a large cluster shared
with other applications;
+* An HBase cluster used to *SQL query*. Usually this cluster is configured for
Kylin. The number of nodes does not need to be as many as Hadoop clusters.
HBase configuration can be optimized for Kylin Cube read-only features.
-To enable Kylin service high availability, you need setup a load balancer in
front of these servers, letting it routes the incoming requests to the cluster.
Client side communicates with the load balancer, instead of with a specific
Kylin instance. The setup of load balancer is out of the scope; you may select
an implementation like Nginx, F5 or cloud LB service.
-
-
-### Configure Read/Write separated deployment
-
-Kylin can work with two clusters to gain better stability and performance:
-
- * A Hadoop cluster for Cube building; This can be a shared, large cluster.
- * A HBase cluster for SQL queries; Usually this is a dedicated cluster with
less nodes. The HBase configurations can be tuned for better read performance
as Cubes are immutable after built.
-
-This deployment has been adopted and verified by many large companies. It is
the best solution for production deployment as we know. For how to do this,
please refer to [Deploy Apache Kylin with Standalone HBase
Cluster](/blog/2016/06/10/standalone-hbase-cluster/)
\ No newline at end of file
+This deployment strategy is the best deployment solution for the production
environment. For how to perform read-write separation deployment, please refer
to [Deploy Apache Kylin with Standalone HBase
Cluster](/blog/2016/06/10/standalone-hbase-cluster/) .
\ No newline at end of file
diff --git a/website/_docs/install/kylin_docker.cn.md
b/website/_docs/install/kylin_docker.cn.md
index a02218a184..e1b9e4dda6 100644
--- a/website/_docs/install/kylin_docker.cn.md
+++ b/website/_docs/install/kylin_docker.cn.md
@@ -7,4 +7,4 @@ version: v1.5.3
since: v1.5.2
---
-Apache Kylin 作为一个 Hadoop 集群的客户端运行, 因此运行在 Docker 容器中是合理的; 请查看 github
项目[kylin-docker](https://github.com/Kyligence/kylin-docker/).
+请查看 github 项目 [kylin-docker](https://github.com/Kyligence/kylin-docker/).
diff --git a/website/_docs/install/kylin_docker.md
b/website/_docs/install/kylin_docker.md
index 5d8661e3e3..cd5ae3277c 100644
--- a/website/_docs/install/kylin_docker.md
+++ b/website/_docs/install/kylin_docker.md
@@ -7,4 +7,4 @@ version: v1.5.3
since: v1.5.2
---
-Apache Kylin runs as a client of Hadoop cluster, so it is reasonable to run
within a Docker container; please check [this
project](https://github.com/Kyligence/kylin-docker/) on github.
+For more information, please refer to [this
project](https://github.com/Kyligence/kylin-docker/) on github.
diff --git a/website/_docs/install/manual_install_guide.cn.md
b/website/_docs/install/manual_install_guide.cn.md
deleted file mode 100644
index a2694a93f2..0000000000
--- a/website/_docs/install/manual_install_guide.cn.md
+++ /dev/null
@@ -1,29 +0,0 @@
----
-layout: docs-cn
-title: "手动安装指南"
-categories: 安装
-permalink: /cn/docs/install/manual_install_guide.html
-version: v0.7.2
-since: v0.7.1
----
-
-## 引言
-
-在大多数情况下,我们的自动脚本[Installation Guide](./index.html)可以帮助你在你的hadoop
sandbox甚至你的hadoop cluster中启动Kylin。但是,为防部署脚本出错,我们撰写本文作为参考指南来解决你的问题。
-
-基本上本文解释了自动脚本中的每一步骤。我们假设你已经对Linux上的Hadoop操作非常熟悉。
-
-## 前提条件
-* Kylin 二进制文件拷贝至本地并解压,之后使用$KYLIN_HOME引用
-`export KYLIN_HOME=/path/to/kylin`
-`cd $KYLIN_HOME`
-
-### 启动Kylin
-
-以`./bin/kylin.sh start`
-
-启动Kylin
-
-并以`./bin/Kylin.sh stop`
-
-停止Kylin
diff --git a/website/images/install/overwrite_config.png
b/website/images/install/override_config_cube.png
similarity index 100%
rename from website/images/install/overwrite_config.png
rename to website/images/install/override_config_cube.png
diff --git a/website/images/install/overwrite_config_project.png
b/website/images/install/override_config_project.png
similarity index 100%
rename from website/images/install/overwrite_config_project.png
rename to website/images/install/override_config_project.png
diff --git a/website/images/install/overwrite_config_v2.png
b/website/images/install/overwrite_config_v2.png
deleted file mode 100644
index 22c40a33fb..0000000000
Binary files a/website/images/install/overwrite_config_v2.png and /dev/null
differ
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services