kylin_aws_emr.html feed.xml

lidong Fri, 17 Nov 2017 06:02:05 -0800

Author: lidong
Date: Fri Nov 17 14:01:38 2017
New Revision: 1815567

URL: http://svn.apache.org/viewvc?rev=1815567&view=rev
Log:
Add more to kylin on aws


Modified:
    kylin/site/docs21/install/kylin_aws_emr.html
    kylin/site/feed.xml

Modified: kylin/site/docs21/install/kylin_aws_emr.html
URL: 
http://svn.apache.org/viewvc/kylin/site/docs21/install/kylin_aws_emr.html?rev=1815567&r1=1815566&r2=1815567&view=diff
==============================================================================
--- kylin/site/docs21/install/kylin_aws_emr.html (original)
+++ kylin/site/docs21/install/kylin_aws_emr.html Fri Nov 17 14:01:38 2017
@@ -3204,7 +3204,7 @@
 
 <p>You can select âHDFSâ or âS3â as the storage for HBase, depending 
on whether you need Cube data be persisted after shutting down the cluster. EMR 
HDFS uses the local disk of EC2 instances, which will erase the data when 
cluster is stopped, then Kylin metadata and Cube data can be lost.</p>
 
-<p>If you use âS3â as HBaseâs storage, you need customize its 
configuration for âhbase.rpc.timeoutâ, because the bulk load to S3 is a 
copy operation, when data size is huge, HBase region server need wait much 
longer time than on HDFS to finish.</p>
+<p>If you use âS3â as HBaseâs storage, you need customize its 
configuration for â<strong>hbase.rpc.timeout</strong>â, because the bulk 
load to S3 is a copy operation, when data size is huge, HBase region server 
need wait much longer to finish than on HDFS.</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>[  {
     "Classification": "hbase-site",
@@ -3254,34 +3254,71 @@ tar âzxvf apache-kylin-2.2.0-bin-hb
   <li>Use HDFS as âkylin.env.hdfs-working-dirâ</li>
 </ul>
 
-<p>If using HDFS as Kylin working directory, you can leave configurations 
unchanged as EMRâs default FS is HDFS:</p>
+<p>EMR recommends to âuse HDFS for intermediate data storage while the 
cluster is running and Amazon S3 only to input the initial data and output the 
final resultsâ.</p>
+
+<p>If using HDFS as Kylin working directory, you just leave configurations 
unchanged as EMRâs default FS is HDFS:</p>
 
 <div class="highlighter-rouge"><pre 
class="highlight"><code>kylin.env.hdfs-working-dir=/kylin
 </code></pre>
 </div>
 
-<p>This will be very similar as on-premises deployment.</p>
+<p>Before you shudown/restart the cluster, you can backup the data on HDFS to 
S3 with <a 
href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html";>S3DistCp</a>.</p>
 
 <ul>
   <li>Use S3 as âkylin.env.hdfs-working-dirâ</li>
 </ul>
 
-<p>Configure the following 2 parameters:</p>
+<p>If you want to totally use S3 as storage (assume HBase is also on S3), 
configure the following 2 parameters:</p>
 
 <div class="highlighter-rouge"><pre 
class="highlight"><code>kylin.env.hdfs-working-dir=s3://yourbucket/kylin
 kylin.storage.hbase.cluster-fs=s3://yourbucket
 
 </code></pre>
 </div>
-<p>Then Kylin will use S3 for Cube building, big metadata file and Cube. The 
performance might be slower than HDFS.</p>
+
+<p>The intermediate file and the HFile will all be written to S3. The build 
performance should be slower than HDFS. Make sure you have a good understanding 
about the difference between S3 and HDFS.</p>
+
+<ul>
+  <li>Hadoop configurations</li>
+</ul>
+
+<p>Some Hadoop configurations need be applied for better performance and data 
consistency on S3, according to <a 
href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html";>emr-troubleshoot-errors-io</a></p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;property&gt;
+  &lt;name&gt;io.file.buffer.size&lt;/name&gt;
+  &lt;value&gt;65536&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapred.map.tasks.speculative.execution&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapred.reduce.tasks.speculative.execution&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapreduce.map.speculative&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;mapreduce.reduce.speculative&lt;/name&gt;
+  &lt;value&gt;false&lt;/value&gt;
+&lt;/property&gt;
+
+</code></pre>
+</div>
 
 <ul>
   <li>Create the working-dir folder if it doesnât exist</li>
 </ul>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>hadoop fs -mkdir 
/kylin 
-or
-hadoop fs -mkdir s3://yourbucket/kylin
+</code></pre>
+</div>
+
+<p>or</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>hadoop fs -mkdir 
s3://yourbucket/kylin
 </code></pre>
 </div>
 

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1815567&r1=1815566&r2=1815567&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Fri Nov 17 14:01:38 2017
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Wed, 15 Nov 2017 19:16:44 -0800</pubDate>
-    <lastBuildDate>Wed, 15 Nov 2017 19:16:44 -0800</lastBuildDate>
+    <pubDate>Fri, 17 Nov 2017 05:59:10 -0800</pubDate>
+    <lastBuildDate>Fri, 17 Nov 2017 05:59:10 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>

svn commit: r1815567 - in /kylin/site: docs21/install/kylin_aws_emr.html feed.xml

Reply via email to