Author: lidong
Date: Sun Jan 5 14:10:51 2020
New Revision: 1872352
URL: http://svn.apache.org/viewvc?rev=1872352&view=rev
Log:
Modify AWS EMR doc and AWS Glue doc
Modified:
kylin/site/cn/docs31/install/kylin_aws_emr.html
kylin/site/docs31/install/kylin_aws_emr.html
kylin/site/feed.xml
Modified: kylin/site/cn/docs31/install/kylin_aws_emr.html
URL:
http://svn.apache.org/viewvc/kylin/site/cn/docs31/install/kylin_aws_emr.html?rev=1872352&r1=1872351&r2=1872352&view=diff
==============================================================================
--- kylin/site/cn/docs31/install/kylin_aws_emr.html (original)
+++ kylin/site/cn/docs31/install/kylin_aws_emr.html Sun Jan 5 14:10:51 2020
@@ -182,8 +182,8 @@ var _hmt = _hmt || [];
<h3 id="section">æ¨èçæ¬</h3>
<ul>
- <li>AWS EMR 5.7 (EMR 5.8 å以ä¸ï¼è¯·æ¥ç <a
href="https://issues.apache.org/jira/browse/KYLIN-3129">KYLIN-3129</a>)</li>
- <li>Apache Kylin v2.2.0 or above for HBase 1.x</li>
+ <li>AWS EMR 5.27</li>
+ <li>Apache Kylin v3.0.0 or above for HBase 1.x</li>
</ul>
<h3 id="emr-">å¯å¨ EMR é群</h3>
@@ -194,23 +194,32 @@ var _hmt = _hmt || [];
<p>妿æ¨ä½¿ç¨ S3 ä½ä¸º HBase çåå¨ï¼æ¨éè¦èªå®ä¹é
置为
<code class="highlighter-rouge">hbase.rpc.timeout</code>ï¼ç±äº S3
ç大容éè´è½½æ¯ä¸ä¸ªå¤å¶æä½ï¼å½æ°æ®è§æ¨¡æ¯è¾å¤§æ¶ï¼HBase
Region æå¡å¨æ¯å¨ HDFS ä¸å°è±è´¹æ´å¤çæ¶é´çå¾
å
¶å®æã</p>
-<div class="highlighter-rouge"><pre class="highlight"><code>[ {
- "Classification": "hbase-site",
- "Properties": {
- "hbase.rpc.timeout": "3600000",
- "hbase.rootdir": "s3://yourbucket/EMRROOT"
- }
- },
- {
- "Classification": "hbase",
- "Properties": {
- "hbase.emr.storageMode": "s3"
- }
- }
-]
+<p>妿æ¨å¸æEMRçHive使ç¨ä¸ä¸ªå¤é¨çå
æ°æ®ï¼æ¨å¯ä»¥èè使ç¨RDSæè
AWS Glueã飿
·æ¨å°±å¯ä»¥å¨äºä¸ç¯å¢æå»ºä¸ä¸ªstatelessçOLAPæå¡äºã</p>
+
+<p>让æä»¬éè¿AWS CLIå建ä¸ä¸ªEMR
é群ï¼å¹¶ä¸å¼å¯ï¼å½ç¶ä»¥ä¸å 项æ¯å¯éçï¼<br />
+1. S3ä½ä¸ºHBaseæ°æ®åå¨<br />
+2. AWS Glueä½ä¸ºHiveå
æ°æ®<br />
+3. å¼å¯S3å
æ°æ®ä¸è´æ§ä»¥é²æ¢æ°æ®æä»¶ä¸¢å¤±</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr
create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase
Name=Spark Name=Sqoop Name=Tez Name=ZooKeeper \
+ --release-label emr-5.28.0 \
+ --instance-groups
'[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Worker
Node"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master
Node"}]' \
+ --configurations
'[{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://{S3_BUCKET}/hbase/data","hbase.rpc.timeout":
"3600000"}},{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'
\
+ --name 'Kylin3.0Cluster_Original' \
+ --emrfs Consistent=true \
+ --region cn-northwest-1
</code></pre>
</div>
+<h3 id="aws-gluehive">æ¯æAWS Glueä½ä¸ºHiveå
æ°æ®åå¨</h3>
+
+<p>å¦æä½ éè¦å¼å¯Glueä½ä¸ºHiveå
æ°æ®, 请åè<code
class="highlighter-rouge">https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore</code>
æ¥è¿è¡æå
ãä½ éè¦è·å以ä¸jarï¼</p>
+
+<ol>
+ <li>aws-glue-datacatalog-client-common-xxx.jar</li>
+ <li>aws-glue-datacatalog-hive2-client-xxx.jar</li>
+</ol>
+
<h3 id="kylin">å®è£
Kylin</h3>
<p>å½ EMR é群å¤äº âWaitingâ ç¶æï¼æ¨å¯ä»¥ SSH å° master
èç¹ï¼ä¸è½½ Kylin ç¶åè§£å tar å
:</p>
@@ -218,8 +227,8 @@ var _hmt = _hmt || [];
<div class="highlighter-rouge"><pre class="highlight"><code>sudo mkdir
/usr/local/kylin
sudo chown hadoop /usr/local/kylin
<span class="nb">cd</span> /usr/local/kylin
-wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
-tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
+wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-hbase1x.tar.gz
+tar -zxvf apache-kylin-3.0.0-bin-hbase1x.tar.gz
</code></pre>
</div>
@@ -314,35 +323,88 @@ tar -zxvf apache-kylin-2.5.0-bin-hbase1x
</code></pre>
</div>
-<h3 id="kylin-2">å¯å¨ Kylin</h3>
+<h3 id="section-1">è§£å³å
å²çª</h3>
-<p>å¯å¨å卿®é Hadoop ä¸ä¸æ ·:</p>
+<ul>
+ <li>å°ä»¥ä¸å
容添å å° ~/.bashrc</li>
+</ul>
-<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">KYLIN_HOME</span><span
class="o">=</span>/usr/local/kylin/apache-kylin-2.2.0-bin
-<span class="nv">$KYLIN_HOME</span>/bin/sample.sh
-<span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">HIVE_HOME</span><span
class="o">=</span>/usr/lib/hive
+<span class="nb">export </span><span class="nv">HADOOP_HOME</span><span
class="o">=</span>/usr/lib/hadoop
+<span class="nb">export </span><span class="nv">HBASE_HOME</span><span
class="o">=</span>/usr/lib/hbase
+<span class="nb">export </span><span class="nv">SPARK_HOME</span><span
class="o">=</span>/usr/lib/spark
+
+<span class="nb">export </span><span class="nv">KYLIN_HOME</span><span
class="o">=</span>/home/ec2-user/apache-kylin-3.0.0-SNAPSHOT-bin
+<span class="nb">export </span><span class="nv">HCAT_HOME</span><span
class="o">=</span>/usr/lib/hive-hcatalog
+<span class="nb">export </span><span class="nv">KYLIN_CONF_HOME</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/conf
+<span class="nb">export </span><span class="nv">tomcat_root</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/tomcat
+<span class="nb">export </span><span class="nv">hive_dependency</span><span
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span
class="nv">$HIVE_HOME</span>/lib/:<span
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:<span
class="nv">$SPARK_HOME</span>/jars/
+<span class="nb">export </span><span class="nv">PATH</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/bin:<span
class="nv">$PATH</span>
+
+<span class="nb">export </span><span class="nv">hive_dependency</span><span
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span
class="nv">$HIVE_HOME</span>/lib/<span class="k">*</span>:<span
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:/usr/share/aws/hmclient/lib/<span
class="k">*</span>:<span class="nv">$SPARK_HOME</span>/jars/<span
class="k">*</span>:<span class="nv">$HBASE_HOME</span>/lib/<span
class="k">*</span>.jar:<span class="nv">$HBASE_HOME</span>/<span
class="k">*</span>.jar
</code></pre>
</div>
-<p>å«å¿è®°å¨ EMR master - âElasticMapReduce-masterâ çå®å
¨ç»ä¸å¯ç¨ 7070 端å£è®¿é®ï¼æä½¿ç¨ SSH è¿æ¥ master
èç¹ï¼ç¶åæ¨å¯ä»¥ä½¿ç¨ <code
class="highlighter-rouge">http://<master-dns>:7070/kylin</code> 访é®
Kylin Web GUIã</p>
+<ul>
+ <li>ææ¶å é¤ joda.jar</li>
+</ul>
-<p>Build åä¸ä¸ª Cubeï¼å½ Cube åå¤å¥½åè¿è¡æ¥è¯¢ãæ¨å¯ä»¥æµè§
S3 æ¥çæ°æ®æ¯å¦å®å
¨çæä¹
åäºã</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>mv <span
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar <span
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar.backup
+</code></pre>
+</div>
+
+<ul>
+ <li>ä¿®æ¹ bin/kylin.sh</li>
+</ul>
+
+<p>å°ä»¥ä¸å
容添å å° bin/kylin.shç å¼å§</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">HBASE_CLASSPATH_PREFIX</span><span
class="o">=</span><span class="k">${</span><span
class="nv">tomcat_root</span><span class="k">}</span>/bin/bootstrap.jar:<span
class="k">${</span><span class="nv">tomcat_root</span><span
class="k">}</span>/bin/tomcat-juli.jar:<span class="k">${</span><span
class="nv">tomcat_root</span><span class="k">}</span>/lib/<span
class="k">*</span>:<span class="nv">$hive_dependency</span>:<span
class="nv">$HBASE_CLASSPATH_PREFIX</span>
+</code></pre>
+</div>
+
+<h3 id="gluehive">å¼å¯æ¯æGlueä½ä¸ºHiveæ°æ®æº(å¯éç)</h3>
+<ul>
+ <li>æ<code
class="highlighter-rouge">aws-glue-datacatalog-client-common-xxx.jar</code>å<code
class="highlighter-rouge">aws-glue-datacatalog-hive2-client-xxx.jar</code>æ¾å°
<code class="highlighter-rouge">$KYLIN_HOME/lib</code>ç®å½ä¸</li>
+ <li>å¨<code class="highlighter-rouge">kylin.properties</code>ä¸ä¿®æ¹<code
class="highlighter-rouge">kylin.source.hive.metadata-type=gluecatalog</code></li>
+</ul>
+
+<h3 id="spark">é
ç½® Spark</h3>
-<h3 id="spark-">Spark é
ç½®</h3>
+<ul>
+ <li>对Sparkè¿è¡æå
</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>rm -rf <span
class="nv">$KYLIN_HOME</span>/spark_jars
+mkdir <span class="nv">$KYLIN_HOME</span>/spark_jars
+cp /usr/lib/spark/jars/<span class="k">*</span>.jar <span
class="nv">$KYLIN_HOME</span>/spark_jars
+cp -f /usr/lib/hbase/lib/<span class="k">*</span>.jar <span
class="nv">$KYLIN_HOME</span>/spark_jars
+
+rm -f netty-3.9.9.Final.jar
+rm -f netty-all-4.1.8.Final.jar
+
+â¨jar cv0f spark-libs.jar -C <span class="nv">$KYLIN_HOME</span>/spark_jars .
+aws s3 cp spark-libs.jar s3://<span class="o">{</span>YOUR_BUCKET<span
class="o">}</span>/kylin/package/ <span class="c"># You choose s3 as your
working-dir</span>
+hadoop fs -put spark-libs.jar hdfs://kylin/package/ <span class="c"># You
choose hdfs as your working-dir</span>
+</code></pre>
+</div>
-<p>EMR ç Spark çæ¬å¾å¯è½ä¸ Kylin ç¼è¯ççæ¬ä¸ä¸è´ï¼å
æ¤æ¨é常ä¸è½ç´æ¥ä½¿ç¨ EMR æå
ç Spark ç¨äº Kylin çä»»å¡ã
æ¨éè¦å¨å¯å¨ Kylin ä¹åï¼å° âSPARK_HOMEâ ç¯å¢åé设置æå
Kylin ç Spark åç®å½ (KYLIN_HOME/spark) ãæ¤å¤ï¼ä¸ºäºä» Spark
ä¸è®¿é® S3 æ EMRFS ä¸çæä»¶ï¼æ¨éè¦å° EMR çæ©å±ç±»ä» EMR
çç®å½æ·è´å° Kylin ç Spark ä¸ã</p>
+<ul>
+ <li>å¨ <code class="highlighter-rouge">kylin.properties</code>设置<code
class="highlighter-rouge">kylin.engine.spark-conf.spark.yarn.archive=PATH_TO_SPARK_LIB</code></li>
+</ul>
-<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">SPARK_HOME</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/spark
+<h3 id="kylin-2">å¯å¨ Kylin</h3>
-cp /usr/lib/hadoop-lzo/lib/<span class="k">*</span>.jar <span
class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-<span
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/lib/hadoop/hadoop-common<span class="k">*</span>-amzn-<span
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
+<p>å¯å¨å卿®é Hadoop ä¸ä¸æ ·:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nv">$KYLIN_HOME</span>/bin/sample.sh
<span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start
</code></pre>
</div>
-<p>æ¨ä¹å¯ä»¥åè EMR Spark ç spark-defaults æ¥è®¾ç½® Kylin ç Spark é
ç½®ï¼ä»¥è·å¾æ´å¥½ç对éç¾¤èµæºçéé
ã</p>
+<p>å«å¿è®°å¨ EMR master - âElasticMapReduce-masterâ çå®å
¨ç»ä¸å¯ç¨ 7070 端å£è®¿é®ï¼æä½¿ç¨ SSH è¿æ¥ master
èç¹ï¼ç¶åæ¨å¯ä»¥ä½¿ç¨ <code
class="highlighter-rouge">http://<master-dns>:7070/kylin</code> 访é®
Kylin Web GUIã</p>
+
+<p>Build åä¸ä¸ª Cubeï¼å½ Cube åå¤å¥½åè¿è¡æ¥è¯¢ãæ¨å¯ä»¥æµè§
S3 æ¥çæ°æ®æ¯å¦å®å
¨çæä¹
åäºã</p>
<h3 id="emr--1">å
³é EMR é群</h3>
@@ -356,10 +418,21 @@ cp /usr/lib/hadoop/hadoop-common<span cl
<p>为äºç¨åæ ·ç Hbase æ°æ®éå¯ä¸ä¸ªé群ï¼å¯å¨ AWS Management
Console 䏿å®åä¹åé群ç¸åç Amazon S3 ä½ç½®æä½¿ç¨ <code
class="highlighter-rouge">hbase.rootdir</code> é
ç½®å±æ§ãæ´å¤ç EMR
HBase ä¿¡æ¯ï¼åè <a
href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html">HBase
on Amazon S3</a></p>
-<h2 id="ec2--kylin">å¨ä¸ç¨ç EC2 ä¸é¨ç½² Kylin</h2>
+<h3 id="ec2--kylin">å¨ä¸ç¨ç EC2 ä¸é¨ç½² Kylin</h3>
<p>æ¨èå¨ä¸é¨ç client èç¹ä¸è¿è¡ Kylin (è䏿¯ masterï¼core
æ task)ãå¯å¨ä¸ä¸ªåæ¨ EMR æåæ · VPC ä¸åç½çç¬ç« EC2
å®ä¾ï¼ä» master èç¹å¤å¶ Hadoop clients å°è¯¥å®ä¾ï¼ç¶åå¨å
¶ä¸å®è£
Kylinãè¿å¯æå Kylin èªèº«ä¸ master
èç¹ä¸æå¡çç¨³å®æ§ã</p>
+<h3 id="section-2">å
¶ä»é®é¢</h3>
+
+<p>妿å°S3é
置为æ¨çworking-dirï¼å¹¶ä¸åç°äºâWrong
FSâå¼å¸¸ï¼è¯·å°è¯ä¿®æ¹ <code
class="highlighter-rouge">$KYLIN_HOME/conf/kylin_hive_conf.xml</code>ï¼<code
class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>ï¼<code
class="highlighter-rouge">/etc/hadoop/conf/core-site.xml</code>ã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code> <span
class="nt"><property></span>
+ <span class="nt"><name></span>fs.defaultFS<span
class="nt"></name></span>
+ <span class="nt"><value></span>s3://{YOUR_BUCKET}<span
class="nt"></value></span>
+ <span
class="c"><!--<value>hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020</value>--></span>
+ <span class="nt"></property></span>
+</code></pre>
+</div>
</article>
</div>
Modified: kylin/site/docs31/install/kylin_aws_emr.html
URL:
http://svn.apache.org/viewvc/kylin/site/docs31/install/kylin_aws_emr.html?rev=1872352&r1=1872351&r2=1872352&view=diff
==============================================================================
--- kylin/site/docs31/install/kylin_aws_emr.html (original)
+++ kylin/site/docs31/install/kylin_aws_emr.html Sun Jan 5 14:10:51 2020
@@ -7050,35 +7050,42 @@ var _hmt = _hmt || [];
<h3 id="recommended-version">Recommended Version</h3>
<ul>
- <li>AWS EMR 5.7 (for EMR 5.8 and above, please refer to <a
href="https://issues.apache.org/jira/browse/KYLIN-3129">KYLIN-3129</a>)</li>
- <li>Apache Kylin v2.2.0 or above for HBase 1.x</li>
+ <li>AWS EMR 5.27 or later</li>
+ <li>Apache Kylin v3.0.0 or above for HBase 1.x</li>
</ul>
<h3 id="start-emr-cluster">Start EMR cluster</h3>
<p>Launch an EMR cluster with AWS web console, command line or API. Select
<em>HBase</em> in the applications as Kylin need HBase service.</p>
-<p>You can select âHDFSâ or âS3â as the storage for HBase, depending
on whether you need Cube data be persisted after shutting down the cluster. EMR
HDFS uses the local disk of EC2 instances, which will erase the data when
cluster is stopped, then Kylin metadata and Cube data can be lost.</p>
+<p>You can choose âHDFSâ or âS3â as the storage for HBase, depending
on whether you need Cube data be persisted after shutting down the cluster. EMR
HDFS uses the local disk of EC2 instances, which will erase the data when
cluster is stopped, then Kylin metadata and Cube data will be lost.<br />
+If you use S3 as HBaseâs storage, you need customize its configuration for
<code class="highlighter-rouge">hbase.rpc.timeout</code>, because the bulk load
to S3 is a copy operation, when data size is huge, HBase region server need
wait much longer to finish than on HDFS.<br />
+If you want your metadata of Hive is persisted outside of EMR cluster, you can
choose AWS Glue or RDS of the metadata of Hive. Thus you can build a state-less
OLAP service by Kylin in cloud.</p>
-<p>If you use S3 as HBaseâs storage, you need customize its configuration
for <code class="highlighter-rouge">hbase.rpc.timeout</code>, because the bulk
load to S3 is a copy operation, when data size is huge, HBase region server
need wait much longer to finish than on HDFS.</p>
+<p>Let create a demo EMR cluster via AWS CLIï¼with <br />
+1. S3 as HBase storage (optional)<br />
+2. Glue as Hive Metadata (optional)<br />
+3. Enable consist metadata of S3 to make sure data wouldnât lose
(optional)</p>
-<div class="highlighter-rouge"><pre class="highlight"><code>[ {
- "Classification": "hbase-site",
- "Properties": {
- "hbase.rpc.timeout": "3600000",
- "hbase.rootdir": "s3://yourbucket/EMRROOT"
- }
- },
- {
- "Classification": "hbase",
- "Properties": {
- "hbase.emr.storageMode": "s3"
- }
- }
-]
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr
create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=HBase
Name=Spark Name=Sqoop Name=Tez Name=ZooKeeper \
+ --release-label emr-5.28.0 \
+ --instance-groups
'[{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Name":"Worker
Node"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Name":"Master
Node"}]' \
+ --configurations
'[{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://{S3_BUCKET}/hbase/data","hbase.rpc.timeout":
"3600000"}},{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'
\
+ --name 'Kylin3.0Cluster_Original' \
+ --emrfs Consistent=true \
+ --region cn-northwest-1
</code></pre>
</div>
+<h3 id="support-glue-as-metadata-of-hive">Support Glue as metadata of Hive</h3>
+
+<p>If you want to enable support read metadata from Glue, please refer to
<code
class="highlighter-rouge">https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore</code>
and build two jars.</p>
+
+<ol>
+ <li>aws-glue-datacatalog-client-common-xxx.jar</li>
+ <li>aws-glue-datacatalog-hive2-client-xxx.jar</li>
+</ol>
+
<h3 id="install-kylin">Install Kylin</h3>
<p>When EMR cluster is in âWaitingâ status, you can SSH into its master
node, download Kylin and then uncompress the tar-ball file:</p>
@@ -7086,8 +7093,8 @@ var _hmt = _hmt || [];
<div class="highlighter-rouge"><pre class="highlight"><code>sudo mkdir
/usr/local/kylin
sudo chown hadoop /usr/local/kylin
<span class="nb">cd</span> /usr/local/kylin
-wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
-tar -zxvf apache-kylin-2.5.0-bin-hbase1x.tar.gz
+wget
http://mirror.bit.edu.cn/apache/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-hbase1x.tar.gz
+tar -zxvf apache-kylin-3.0.0-bin-hbase1x.tar.gz
</code></pre>
</div>
@@ -7181,35 +7188,85 @@ tar -zxvf apache-kylin-2.5.0-bin-hbase1x
</code></pre>
</div>
-<h3 id="start-kylin">Start Kylin</h3>
+<h3 id="solve-jar-conflict">Solve jar conflict</h3>
+<ul>
+ <li>Add following env variable in ~/.bashrc</li>
+</ul>
-<p>The start is the same as on normal Hadoop:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">HIVE_HOME</span><span
class="o">=</span>/usr/lib/hive
+<span class="nb">export </span><span class="nv">HADOOP_HOME</span><span
class="o">=</span>/usr/lib/hadoop
+<span class="nb">export </span><span class="nv">HBASE_HOME</span><span
class="o">=</span>/usr/lib/hbase
+<span class="nb">export </span><span class="nv">SPARK_HOME</span><span
class="o">=</span>/usr/lib/spark
+
+<span class="nb">export </span><span class="nv">KYLIN_HOME</span><span
class="o">=</span>/home/ec2-user/apache-kylin-3.0.0-SNAPSHOT-bin
+<span class="nb">export </span><span class="nv">HCAT_HOME</span><span
class="o">=</span>/usr/lib/hive-hcatalog
+<span class="nb">export </span><span class="nv">KYLIN_CONF_HOME</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/conf
+<span class="nb">export </span><span class="nv">tomcat_root</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/tomcat
+<span class="nb">export </span><span class="nv">hive_dependency</span><span
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span
class="nv">$HIVE_HOME</span>/lib/:<span
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:<span
class="nv">$SPARK_HOME</span>/jars/
+<span class="nb">export </span><span class="nv">PATH</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/bin:<span
class="nv">$PATH</span>
-<div class="highlighter-rouge"><pre class="highlight"><code>export
KYLIN_HOME=/usr/local/kylin/apache-kylin-2.2.0-bin
-$KYLIN_HOME/bin/sample.sh
-$KYLIN_HOME/bin/kylin.sh start
+<span class="nb">export </span><span class="nv">hive_dependency</span><span
class="o">=</span><span class="nv">$HIVE_HOME</span>/conf:<span
class="nv">$HIVE_HOME</span>/lib/<span class="k">*</span>:<span
class="nv">$HIVE_HOME</span>/lib/hive-hcatalog-core.jar:/usr/share/aws/hmclient/lib/<span
class="k">*</span>:<span class="nv">$SPARK_HOME</span>/jars/<span
class="k">*</span>:<span class="nv">$HBASE_HOME</span>/lib/<span
class="k">*</span>.jar:<span class="nv">$HBASE_HOME</span>/<span
class="k">*</span>.jar
</code></pre>
</div>
-<p>Donât forget to enable the 7070 port access in the security group for EMR
master - âElasticMapReduce-masterâ, or with SSH tunnel to the master node,
then you can access Kylin Web GUI at http://<master-dns>:7070/kylin</p>
+<ul>
+ <li>Remove joda.jar</li>
+</ul>
-<p>Build the sample Cube, and then run queries when the Cube is ready. You can
browse S3 to see whether the data is safely persisted.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>mv <span
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar <span
class="nv">$HIVE_HOME</span>/lib/jackson-datatype-joda-2.4.6.jar.backup
+</code></pre>
+</div>
+
+<ul>
+ <li>Modify bin/kylin.sh<br />
+Add following content on the top of bin/kylin.sh</li>
+</ul>
-<h3 id="spark-configuration">Spark Configuration</h3>
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">HBASE_CLASSPATH_PREFIX</span><span
class="o">=</span><span class="k">${</span><span
class="nv">tomcat_root</span><span class="k">}</span>/bin/bootstrap.jar:<span
class="k">${</span><span class="nv">tomcat_root</span><span
class="k">}</span>/bin/tomcat-juli.jar:<span class="k">${</span><span
class="nv">tomcat_root</span><span class="k">}</span>/lib/<span
class="k">*</span>:<span class="nv">$hive_dependency</span>:<span
class="nv">$HBASE_CLASSPATH_PREFIX</span>
+</code></pre>
+</div>
-<p>EMRâs Spark version may be incompatible with Kylin, so you couldnât
directly use EMRâs Spark. You need to set âSPARK_HOMEâ environment
variable to Kylinâs Spark folder (KYLIN_HOME/spark) before start Kylin. To
access files on S3 or EMRFS, we need to copy EMRâs implementation jars to
Spark.</p>
+<h3 id="enable-glue-as-metadata-for-hiveoptional">Enable glue as metadata for
Hive(Optional)</h3>
+<ol>
+ <li>Put <code
class="highlighter-rouge">aws-glue-datacatalog-client-common-xxx.jar</code> and
<code
class="highlighter-rouge">aws-glue-datacatalog-hive2-client-xxx.jar</code>
under $KYLIN_HOME/lib.</li>
+ <li>Set <code
class="highlighter-rouge">kylin.source.hive.metadata-type=gluecatalog</code> in
<code class="highlighter-rouge">kylin.properties</code></li>
+</ol>
-<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">export </span><span class="nv">SPARK_HOME</span><span
class="o">=</span><span class="nv">$KYLIN_HOME</span>/spark
+<h3 id="configure-spark">Configure Spark</h3>
-cp /usr/lib/hadoop-lzo/lib/<span class="k">*</span>.jar <span
class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-<span
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
-cp /usr/lib/hadoop/hadoop-common<span class="k">*</span>-amzn-<span
class="k">*</span>.jar <span class="nv">$KYLIN_HOME</span>/spark/jars/
+<ul>
+ <li>Build a Sparkâs flat jar</li>
+</ul>
+<div class="highlighter-rouge"><pre class="highlight"><code>rm -rf <span
class="nv">$KYLIN_HOME</span>/spark_jars
+mkdir <span class="nv">$KYLIN_HOME</span>/spark_jars
+cp /usr/lib/spark/jars/<span class="k">*</span>.jar <span
class="nv">$KYLIN_HOME</span>/spark_jars
+cp -f /usr/lib/hbase/lib/<span class="k">*</span>.jar <span
class="nv">$KYLIN_HOME</span>/spark_jars
+
+rm -f netty-3.9.9.Final.jar
+rm -f netty-all-4.1.8.Final.jar
+
+â¨jar cv0f spark-libs.jar -C <span class="nv">$KYLIN_HOME</span>/spark_jars .
+aws s3 cp spark-libs.jar s3://<span class="o">{</span>YOUR_BUCKET<span
class="o">}</span>/kylin/package/ <span class="c"># You choose s3 as your
working-dir</span>
+hadoop fs -put spark-libs.jar hdfs://kylin/package/ <span class="c"># You
choose hdfs as your working-dir</span>
+</code></pre>
+</div>
+<ul>
+ <li>Set <code
class="highlighter-rouge">kylin.engine.spark-conf.spark.yarn.archive=PATH_TO_SPARK_LIB</code>
in <code class="highlighter-rouge">kylin.properties</code></li>
+</ul>
+
+<h3 id="start-kylin">Start Kylin</h3>
+
+<p>The start is the same as on normal Hadoop:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nv">$KYLIN_HOME</span>/bin/sample.sh
<span class="nv">$KYLIN_HOME</span>/bin/kylin.sh start
</code></pre>
</div>
-<p>You can also copy EMRâs spark-defaults configuration to Kylinâs spark
for a better utilization of the cluster resources.</p>
+<p>Donât forget to enable the 7070 port access in the security group for EMR
master - âElasticMapReduce-masterâ, or with SSH tunnel to the master node,
then you can access Kylin Web GUI at http://<master-dns>:7070/kylin</p>
+
+<p>Build the sample Cube, and then run queries when the Cube is ready. You can
browse S3 to see whether the data is safely persisted.</p>
<h3 id="shut-down-emr-cluster">Shut down EMR Cluster</h3>
@@ -7223,10 +7280,23 @@ cp /usr/lib/hadoop/hadoop-common<span cl
<p>To restart a cluster with the same HBase data, specify the same Amazon S3
location as the previous cluster either in the AWS Management Console or using
the âhbase.rootdirâ configuration property. For more information about EMR
HBase, refer to <a
href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html">HBase
on Amazon S3</a></p>
-<h2 id="deploy-kylin-in-a-dedicated-ec2">Deploy Kylin in a dedicated EC2</h2>
+<h3 id="deploy-kylin-in-a-dedicated-ec2">Deploy Kylin in a dedicated EC2</h3>
<p>Running Kylin in a dedicated client node (not master, core or task) is
recommended. You can start a separate EC2 instance within the same VPC and
subnet as your EMR, copy the Hadoop clients from master node to it, and then
install Kylin in it. This can improve the stability of services in master node
as well as Kylin itself.</p>
+<h3 id="trouble-shotting">Trouble shotting</h3>
+
+<ul>
+ <li>If you set S3 as your working dir and find some âWrong FSâ exception
in kylin.log(if you enable shrunken dictionary), please try to modify
$KYLIN_HOME/conf/kylin_hive_conf.xml, /etc/hive/conf/hive-site.xml,
/etc/hadoop/conf/core-site.xml.</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code> <span
class="nt"><property></span>
+ <span class="nt"><name></span>fs.defaultFS<span
class="nt"></name></span>
+ <span class="nt"><value></span>s3://{YOUR_BUCKET}<span
class="nt"></value></span>
+ <span
class="c"><!--<value>hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020</value>--></span>
+ <span class="nt"></property></span>
+</code></pre>
+</div>
</article>
</div>
Modified: kylin/site/feed.xml
URL:
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1872352&r1=1872351&r2=1872352&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Sun Jan 5 14:10:51 2020
@@ -19,8 +19,8 @@
<description>Apache Kylin Home</description>
<link>http://kylin.apache.org/</link>
<atom:link href="http://kylin.apache.org/feed.xml" rel="self"
type="application/rss+xml"/>
- <pubDate>Fri, 03 Jan 2020 05:59:19 -0800</pubDate>
- <lastBuildDate>Fri, 03 Jan 2020 05:59:19 -0800</lastBuildDate>
+ <pubDate>Sun, 05 Jan 2020 05:59:15 -0800</pubDate>
+ <lastBuildDate>Sun, 05 Jan 2020 05:59:15 -0800</lastBuildDate>
<generator>Jekyll v2.5.3</generator>
<item>