Regenerate website
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c66525cc Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c66525cc Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c66525cc Branch: refs/heads/asf-site Commit: c66525cc62c38bfd10b3a295ed97f036aa3b856a Parents: 920a0be Author: Ismaël MejÃa <ieme...@gmail.com> Authored: Tue Jun 27 11:57:06 2017 +0200 Committer: Ismaël MejÃa <ieme...@gmail.com> Committed: Tue Jun 27 11:57:06 2017 +0200 ---------------------------------------------------------------------- .../documentation/io/built-in/hadoop/index.html | 42 ++++++++++++++++++++ 1 file changed, 42 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/c66525cc/content/documentation/io/built-in/hadoop/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/io/built-in/hadoop/index.html b/content/documentation/io/built-in/hadoop/index.html index a18c9b9..ce66332 100644 --- a/content/documentation/io/built-in/hadoop/index.html +++ b/content/documentation/io/built-in/hadoop/index.html @@ -362,6 +362,48 @@ </code></pre> </div> +<h3 id="amazon-dynamodb---dynamodbinputformat">Amazon DynamoDB - DynamoDBInputFormat</h3> + +<p>To read data from Amazon DynamoDB, use <code class="highlighter-rouge">org.apache.hadoop.dynamodb.read.DynamoDBInputFormat</code>. +DynamoDBInputFormat implements the older <code class="highlighter-rouge">org.apache.hadoop.mapred.InputFormat</code> interface and to make it compatible with HadoopInputFormatIO which uses the newer abstract class <code class="highlighter-rouge">org.apache.hadoop.mapreduce.InputFormat</code>, +a wrapper API is required which acts as an adapter between HadoopInputFormatIO and DynamoDBInputFormat (or in general any InputFormat implementing <code class="highlighter-rouge">org.apache.hadoop.mapred.InputFormat</code>) +The below example uses one such available wrapper API - <a href="https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java">https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java</a></p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Configuration</span> <span class="n">dynamoDBConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Configuration</span><span class="o">();</span> +<span class="n">Job</span> <span class="n">job</span> <span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span class="na">getInstance</span><span class="o">(</span><span class="n">dynamoDBConf</span><span class="o">);</span> +<span class="n">com</span><span class="o">.</span><span class="na">twitter</span><span class="o">.</span><span class="na">elephantbird</span><span class="o">.</span><span class="na">mapreduce</span><span class="o">.</span><span class="na">input</span><span class="o">.</span><span class="na">MapReduceInputFormatWrapper</span><span class="o">.</span><span class="na">setInputFormat</span><span class="o">(</span><span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">hadoop</span><span class="o">.</span><span class="na">dynamodb</span><span class="o">.</span><span class="na">read</span><span class="o">.</span><span class="na">DynamoDBInputFormat</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">job</span><span class="o">);</span> +<span class="n">dynamoDBConf</span> <span class="o">=</span> <span class="n">job</span><span class="o">.</span><span class="na">getConfiguration</span><span class="o">();</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">setClass</span><span class="o">(</span><span class="s">"key.class"</span><span class="o">,</span> <span class="n">Text</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">WritableComparable</span><span class="o">.</span><span class="na">class</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">setClass</span><span class="o">(</span><span class="s">"value.class"</span><span class="o">,</span> <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">hadoop</span><span class="o">.</span><span class="na">dynamodb</span><span class="o">.</span><span class="na">DynamoDBItemWritable</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="n">Writable</span><span class="o">.</span><span class="na">class</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.servicename"</span><span class="o">,</span> <span class="s">"dynamodb"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.input.tableName"</span><span class="o">,</span> <span class="s">"table_name"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.endpoint"</span><span class="o">,</span> <span class="s">"dynamodb.us-west-1.amazonaws.com"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.regionid"</span><span class="o">,</span> <span class="s">"us-west-1"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.throughput.read"</span><span class="o">,</span> <span class="s">"1"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.throughput.read.percent"</span><span class="o">,</span> <span class="s">"1"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="s">"dynamodb.version"</span><span class="o">,</span> <span class="s">"2011-12-05"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="n">DynamoDBConstants</span><span class="o">.</span><span class="na">DYNAMODB_ACCESS_KEY_CONF</span><span class="o">,</span> <span class="s">"aws_access_key"</span><span class="o">);</span> +<span class="n">dynamoDBConf</span><span class="o">.</span><span class="na">set</span><span class="o">(</span><span class="n">DynamoDBConstants</span><span class="o">.</span><span class="na">DYNAMODB_SECRET_KEY_CONF</span><span class="o">,</span> <span class="s">"aws_secret_key"</span><span class="o">);</span> +</code></pre> +</div> + +<div class="language-py highlighter-rouge"><pre class="highlight"><code> <span class="c"># The Beam SDK for Python does not support Hadoop InputFormat IO.</span> +</code></pre> +</div> + +<p>Call Read transform as follows:</p> + +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o"><</span><span class="n">Text</span><span class="o">,</span> <span class="n">DynamoDBItemWritable</span><span class="o">></span> <span class="n">dynamoDBData</span> <span class="o">=</span> + <span class="n">p</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="s">"read"</span><span class="o">,</span> + <span class="n">HadoopInputFormatIO</span><span class="o">.<</span><span class="n">Text</span><span class="o">,</span> <span class="n">DynamoDBItemWritable</span><span class="o">></span><span class="n">read</span><span class="o">()</span> + <span class="o">.</span><span class="na">withConfiguration</span><span class="o">(</span><span class="n">dynamoDBConf</span><span class="o">);</span> +</code></pre> +</div> + +<div class="language-py highlighter-rouge"><pre class="highlight"><code> <span class="c"># The Beam SDK for Python does not support Hadoop InputFormat IO.</span> +</code></pre> +</div> + </div> <footer class="footer"> <div class="footer__contained">