Author: lidong Date: Fri Nov 17 14:01:38 2017 New Revision: 1815567 URL: http://svn.apache.org/viewvc?rev=1815567&view=rev Log: Add more to kylin on aws
Modified: kylin/site/docs21/install/kylin_aws_emr.html kylin/site/feed.xml Modified: kylin/site/docs21/install/kylin_aws_emr.html URL: http://svn.apache.org/viewvc/kylin/site/docs21/install/kylin_aws_emr.html?rev=1815567&r1=1815566&r2=1815567&view=diff ============================================================================== --- kylin/site/docs21/install/kylin_aws_emr.html (original) +++ kylin/site/docs21/install/kylin_aws_emr.html Fri Nov 17 14:01:38 2017 @@ -3204,7 +3204,7 @@ <p>You can select âHDFSâ or âS3â as the storage for HBase, depending on whether you need Cube data be persisted after shutting down the cluster. EMR HDFS uses the local disk of EC2 instances, which will erase the data when cluster is stopped, then Kylin metadata and Cube data can be lost.</p> -<p>If you use âS3â as HBaseâs storage, you need customize its configuration for âhbase.rpc.timeoutâ, because the bulk load to S3 is a copy operation, when data size is huge, HBase region server need wait much longer time than on HDFS to finish.</p> +<p>If you use âS3â as HBaseâs storage, you need customize its configuration for â<strong>hbase.rpc.timeout</strong>â, because the bulk load to S3 is a copy operation, when data size is huge, HBase region server need wait much longer to finish than on HDFS.</p> <div class="highlighter-rouge"><pre class="highlight"><code>[ { "Classification": "hbase-site", @@ -3254,34 +3254,71 @@ tar âzxvf apache-kylin-2.2.0-bin-hb <li>Use HDFS as âkylin.env.hdfs-working-dirâ</li> </ul> -<p>If using HDFS as Kylin working directory, you can leave configurations unchanged as EMRâs default FS is HDFS:</p> +<p>EMR recommends to âuse HDFS for intermediate data storage while the cluster is running and Amazon S3 only to input the initial data and output the final resultsâ.</p> + +<p>If using HDFS as Kylin working directory, you just leave configurations unchanged as EMRâs default FS is HDFS:</p> <div class="highlighter-rouge"><pre class="highlight"><code>kylin.env.hdfs-working-dir=/kylin </code></pre> </div> -<p>This will be very similar as on-premises deployment.</p> +<p>Before you shudown/restart the cluster, you can backup the data on HDFS to S3 with <a href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html">S3DistCp</a>.</p> <ul> <li>Use S3 as âkylin.env.hdfs-working-dirâ</li> </ul> -<p>Configure the following 2 parameters:</p> +<p>If you want to totally use S3 as storage (assume HBase is also on S3), configure the following 2 parameters:</p> <div class="highlighter-rouge"><pre class="highlight"><code>kylin.env.hdfs-working-dir=s3://yourbucket/kylin kylin.storage.hbase.cluster-fs=s3://yourbucket </code></pre> </div> -<p>Then Kylin will use S3 for Cube building, big metadata file and Cube. The performance might be slower than HDFS.</p> + +<p>The intermediate file and the HFile will all be written to S3. The build performance should be slower than HDFS. Make sure you have a good understanding about the difference between S3 and HDFS.</p> + +<ul> + <li>Hadoop configurations</li> +</ul> + +<p>Some Hadoop configurations need be applied for better performance and data consistency on S3, according to <a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html">emr-troubleshoot-errors-io</a></p> + +<div class="highlighter-rouge"><pre class="highlight"><code><property> + <name>io.file.buffer.size</name> + <value>65536</value> +</property> +<property> + <name>mapred.map.tasks.speculative.execution</name> + <value>false</value> +</property> +<property> + <name>mapred.reduce.tasks.speculative.execution</name> + <value>false</value> +</property> +<property> + <name>mapreduce.map.speculative</name> + <value>false</value> +</property> +<property> + <name>mapreduce.reduce.speculative</name> + <value>false</value> +</property> + +</code></pre> +</div> <ul> <li>Create the working-dir folder if it doesnât exist</li> </ul> <div class="highlighter-rouge"><pre class="highlight"><code>hadoop fs -mkdir /kylin -or -hadoop fs -mkdir s3://yourbucket/kylin +</code></pre> +</div> + +<p>or</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>hadoop fs -mkdir s3://yourbucket/kylin </code></pre> </div> Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1815567&r1=1815566&r2=1815567&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Fri Nov 17 14:01:38 2017 @@ -19,8 +19,8 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Wed, 15 Nov 2017 19:16:44 -0800</pubDate> - <lastBuildDate>Wed, 15 Nov 2017 19:16:44 -0800</lastBuildDate> + <pubDate>Fri, 17 Nov 2017 05:59:10 -0800</pubDate> + <lastBuildDate>Fri, 17 Nov 2017 05:59:10 -0800</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item>