Author: lidong
Date: Fri Nov 17 14:01:38 2017
New Revision: 1815567

URL: http://svn.apache.org/viewvc?rev=1815567&view=rev
Log:
Add more to kylin on aws

Modified:
    kylin/site/docs21/install/kylin_aws_emr.html
    kylin/site/feed.xml

Modified: kylin/site/docs21/install/kylin_aws_emr.html
URL: 
http://svn.apache.org/viewvc/kylin/site/docs21/install/kylin_aws_emr.html?rev=1815567&r1=1815566&r2=1815567&view=diff
==============================================================================
--- kylin/site/docs21/install/kylin_aws_emr.html (original)
+++ kylin/site/docs21/install/kylin_aws_emr.html Fri Nov 17 14:01:38 2017
@@ -3204,7 +3204,7 @@
 
 <p>You can select “HDFS” or “S3” as the storage for HBase, depending 
on whether you need Cube data be persisted after shutting down the cluster. EMR 
HDFS uses the local disk of EC2 instances, which will erase the data when 
cluster is stopped, then Kylin metadata and Cube data can be lost.</p>
 
-<p>If you use “S3” as HBase’s storage, you need customize its 
configuration for “hbase.rpc.timeout”, because the bulk load to S3 is a 
copy operation, when data size is huge, HBase region server need wait much 
longer time than on HDFS to finish.</p>
+<p>If you use “S3” as HBase’s storage, you need customize its 
configuration for “<strong>hbase.rpc.timeout</strong>”, because the bulk 
load to S3 is a copy operation, when data size is huge, HBase region server 
need wait much longer to finish than on HDFS.</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>[  {
     "Classification": "hbase-site",
@@ -3254,34 +3254,71 @@ tar –zxvf apache-kylin-2.2.0-bin-hb
   <li>Use HDFS as “kylin.env.hdfs-working-dir”</li>
 </ul>
 
-<p>If using HDFS as Kylin working directory, you can leave configurations 
unchanged as EMR’s default FS is HDFS:</p>
+<p>EMR recommends to “use HDFS for intermediate data storage while the 
cluster is running and Amazon S3 only to input the initial data and output the 
final results”.</p>
+
+<p>If using HDFS as Kylin working directory, you just leave configurations 
unchanged as EMR’s default FS is HDFS:</p>
 
 <div class="highlighter-rouge"><pre 
class="highlight"><code>kylin.env.hdfs-working-dir=/kylin
 </code></pre>
 </div>
 
-<p>This will be very similar as on-premises deployment.</p>
+<p>Before you shudown/restart the cluster, you can backup the data on HDFS to 
S3 with <a 
href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html";>S3DistCp</a>.</p>
 
 <ul>
   <li>Use S3 as “kylin.env.hdfs-working-dir”</li>
 </ul>
 
-<p>Configure the following 2 parameters:</p>
+<p>If you want to totally use S3 as storage (assume HBase is also on S3), 
configure the following 2 parameters:</p>
 
 <div class="highlighter-rouge"><pre 
class="highlight"><code>kylin.env.hdfs-working-dir=s3://yourbucket/kylin
 kylin.storage.hbase.cluster-fs=s3://yourbucket
 
 </code></pre>
 </div>
-<p>Then Kylin will use S3 for Cube building, big metadata file and Cube. The 
performance might be slower than HDFS.</p>
+
+<p>The intermediate file and the HFile will all be written to S3. The build 
performance should be slower than HDFS. Make sure you have a good understanding 
about the difference between S3 and HDFS.</p>
+
+<ul>
+  <li>Hadoop configurations</li>
+</ul>
+
+<p>Some Hadoop configurations need be applied for better performance and data 
consistency on S3, according to <a 
href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html";>emr-troubleshoot-errors-io</a></p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;property&gt;
+  &lt;name&gt;io.file.buffer.size&lt;/name&gt;
+  &lt;value&gt;65536&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapred.map.tasks.speculative.execution&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapred.reduce.tasks.speculative.execution&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapreduce.map.speculative&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapreduce.reduce.speculative&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+
+</code></pre>
+</div>
 
 <ul>
   <li>Create the working-dir folder if it doesn’t exist</li>
 </ul>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>hadoop fs -mkdir 
/kylin 
-or
-hadoop fs -mkdir s3://yourbucket/kylin
+</code></pre>
+</div>
+
+<p>or</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>hadoop fs -mkdir 
s3://yourbucket/kylin
 </code></pre>
 </div>
 

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1815567&r1=1815566&r2=1815567&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Fri Nov 17 14:01:38 2017
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Wed, 15 Nov 2017 19:16:44 -0800</pubDate>
-    <lastBuildDate>Wed, 15 Nov 2017 19:16:44 -0800</lastBuildDate>
+    <pubDate>Fri, 17 Nov 2017 05:59:10 -0800</pubDate>
+    <lastBuildDate>Fri, 17 Nov 2017 05:59:10 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>


Reply via email to