Author: buildbot
Date: Sat Oct 24 23:07:36 2015
New Revision: 970127
Log:
Staging update by buildbot for gora
Modified:
websites/staging/gora/trunk/content/ (props changed)
websites/staging/gora/trunk/content/current/gora-core.html
websites/staging/gora/trunk/content/current/tutorial.html
Propchange: websites/staging/gora/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Oct 24 23:07:36 2015
@@ -1 +1 @@
-1710395
+1710397
Modified: websites/staging/gora/trunk/content/current/gora-core.html
==============================================================================
--- websites/staging/gora/trunk/content/current/gora-core.html (original)
+++ websites/staging/gora/trunk/content/current/gora-core.html Sat Oct 24
23:07:36 2015
@@ -303,7 +303,7 @@ This datastore supports MapReduce.</p>
<p>In the stores covered within the gora-core module, no physical mappings are
required.</p>
<h1 id="gorasparkengine">GoraSparkEngine<a class="headerlink"
href="#gorasparkengine" title="Permanent link">¶</a></h1>
<h2 id="description_3">Description<a class="headerlink" href="#description_3"
title="Permanent link">¶</a></h2>
-<p>GoraSparkEngine is Spark backend of Apache Gora. Assume that input and
output data stores are:</p>
+<p>GoraSparkEngine is Spark backend of Gora. Assume that input and output data
stores are:</p>
<div class="codehilite"><pre><span class="n">DataStore</span><span
class="o"><</span><span class="n">K1</span><span class="p">,</span> <span
class="n">V1</span><span class="o">></span> <span
class="n">inStore</span><span class="p">;</span>
<span class="n">DataStore</span><span class="o"><</span><span
class="n">K2</span><span class="p">,</span> <span class="n">V2</span><span
class="o">></span> <span class="n">outStore</span><span class="p">;</span>
</pre></div>
Modified: websites/staging/gora/trunk/content/current/tutorial.html
==============================================================================
--- websites/staging/gora/trunk/content/current/tutorial.html (original)
+++ websites/staging/gora/trunk/content/current/tutorial.html Sat Oct 24
23:07:36 2015
@@ -229,6 +229,7 @@ MapReduce API in some detail.</p>
<li><a href="#running-the-job-with-hbase">Running the job with HBase</a></li>
</ul>
</li>
+<li><a href="#spark-backend">Spark Backend</a></li>
<li><a href="#more-examples">More Examples</a></li>
<li><a href="#feedback">Feedback</a></li>
</ul>
@@ -1189,6 +1190,136 @@ we can run the job with HBase as:</p>
</pre></div>
+<h2 id="spark-backend">Spark Backend<a class="headerlink"
href="#spark-backend" title="Permanent link">¶</a></h2>
+<p>Log analytics example will be implemented via GoraSparkEngine at this
tutorial to explain Spark backend of Gora.
+Data will be read from Hbase, map/reduce methods will be run and result will
be written into Solr (version: 4.10.3).
+All the process will be done over Spark.</p>
+<p>Persist data into Hbase as described at <a
href="/current/tutorial.html#log-analytics-in-mapreduce">Log analytics in
MapReduce</a></p>
+<p>To write result into Solr, create a schemaless core named as Metrics. To do
it easily, you can rename default core of collection1 to Metrics which is at
+<code>solr-4.10.3/example/example-schemaless/solr</code> folder and edit
<code>solr-4.10.3/example/example-schemaless/solr/Metrics/core.properties</code>
as follows:</p>
+<div class="codehilite"><pre><span class="n">name</span><span
class="p">=</span><span class="n">Metrics</span>
+</pre></div>
+
+
+<p>Then run start command for Solr:</p>
+<div class="codehilite"><pre><span class="n">solr</span><span
class="o">-</span>4<span class="p">.</span>10<span class="p">.</span>3<span
class="o">/</span><span class="n">example</span>$ <span class="n">java</span>
<span class="o">-</span><span class="n">Dsolr</span><span
class="p">.</span><span class="n">solr</span><span class="p">.</span><span
class="n">home</span><span class="p">=</span><span
class="n">example</span><span class="o">-</span><span
class="n">schemaless</span><span class="o">/</span><span
class="n">solr</span><span class="o">/</span> <span class="o">-</span><span
class="n">jar</span> <span class="n">start</span><span class="p">.</span><span
class="n">jar</span>
+</pre></div>
+
+
+<p>Read data from Hbase, generate some metrics and write results into Solr
with Spark via Gora. Here is how to initialize in and out data stores:</p>
+<div class="codehilite"><pre><span class="n">public</span> <span
class="n">int</span> <span class="n">run</span><span class="p">(</span><span
class="n">String</span><span class="p">[]</span> <span
class="n">args</span><span class="p">)</span> <span class="n">throws</span>
<span class="n">Exception</span> <span class="p">{</span>
+ <span class="n">DataStore</span><span class="o"><</span><span
class="n">Long</span><span class="p">,</span> <span
class="n">Pageview</span><span class="o">></span> <span
class="n">inStore</span><span class="p">;</span>
+ <span class="n">DataStore</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">MetricDatum</span><span class="o">></span> <span
class="n">outStore</span><span class="p">;</span>
+ <span class="n">Configuration</span> <span class="n">hadoopConf</span> <span
class="p">=</span> <span class="n">new</span> <span
class="n">Configuration</span><span class="p">();</span>
+ <span class="k">if</span> <span class="p">(</span><span
class="n">args</span><span class="p">.</span><span class="nb">length</span>
<span class="o">></span> 0<span class="p">)</span> <span class="p">{</span>
+ <span class="n">String</span> <span class="n">dataStoreClass</span> <span
class="p">=</span> <span class="n">args</span><span class="p">[</span>0<span
class="p">];</span>
+ <span class="n">inStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span
class="n">getDataStore</span><span class="p">(</span><span
class="n">dataStoreClass</span><span class="p">,</span> <span
class="n">Long</span><span class="p">.</span><span class="n">class</span><span
class="p">,</span> <span class="n">Pageview</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span
class="n">hadoopConf</span><span class="p">);</span>
+ <span class="k">if</span> <span class="p">(</span><span
class="n">args</span><span class="p">.</span><span class="nb">length</span>
<span class="o">></span> 1<span class="p">)</span> <span class="p">{</span>
+ <span class="n">dataStoreClass</span> <span class="p">=</span> <span
class="n">args</span><span class="p">[</span>1<span class="p">];</span>
+ <span class="p">}</span>
+ <span class="n">outStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span
class="n">getDataStore</span><span class="p">(</span><span
class="n">dataStoreClass</span><span class="p">,</span> <span
class="n">String</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span
class="n">MetricDatum</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span
class="n">hadoopConf</span><span class="p">);</span>
+ <span class="p">}</span> <span class="k">else</span> <span
class="p">{</span>
+ <span class="n">inStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span
class="n">getDataStore</span><span class="p">(</span><span
class="n">Long</span><span class="p">.</span><span class="n">class</span><span
class="p">,</span> <span class="n">Pageview</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span
class="n">hadoopConf</span><span class="p">);</span>
+ <span class="n">outStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span
class="n">getDataStore</span><span class="p">(</span><span
class="n">String</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span
class="n">MetricDatum</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span
class="n">hadoopConf</span><span class="p">);</span>
+ <span class="p">}</span>
+ <span class="p">...</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<p>Pass input data storeâs key and value classes and instantiate a
GoraSparkEngine:</p>
+<div class="codehilite"><pre><span class="n">GoraSparkEngine</span><span
class="o"><</span><span class="n">Long</span><span class="p">,</span> <span
class="n">Pageview</span><span class="o">></span> <span
class="n">goraSparkEngine</span> <span class="p">=</span> <span
class="n">new</span> <span class="n">GoraSparkEngine</span><span
class="o"><></span><span class="p">(</span><span
class="n">Long</span><span class="p">.</span><span class="n">class</span><span
class="p">,</span> <span class="n">Pageview</span><span class="p">.</span><span
class="n">class</span><span class="p">);</span>
+</pre></div>
+
+
+<p>Construct a JavaSparkContext. Register input data storeâs value class as
Kryo class:</p>
+<div class="codehilite"><pre><span class="n">SparkConf</span> <span
class="n">sparkConf</span> <span class="p">=</span> <span class="n">new</span>
<span class="n">SparkConf</span><span class="p">().</span><span
class="n">setAppName</span><span class="p">(</span>"<span
class="n">Gora</span> <span class="n">Spark</span> <span
class="n">Integration</span> <span class="n">Application</span>"<span
class="p">).</span><span class="n">setMaster</span><span
class="p">(</span>"<span class="n">local</span>"<span
class="p">);</span>
+<span class="n">Class</span><span class="p">[]</span> <span class="n">c</span>
<span class="p">=</span> <span class="n">new</span> <span
class="n">Class</span><span class="p">[</span>1<span class="p">];</span>
+<span class="n">c</span><span class="p">[</span>0<span class="p">]</span>
<span class="p">=</span> <span class="n">inStore</span><span
class="p">.</span><span class="n">getPersistentClass</span><span
class="p">();</span>
+<span class="n">sparkConf</span><span class="p">.</span><span
class="n">registerKryoClasses</span><span class="p">(</span><span
class="n">c</span><span class="p">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">sc</span> <span
class="p">=</span> <span class="n">new</span> <span
class="n">JavaSparkContext</span><span class="p">(</span><span
class="n">sparkConf</span><span class="p">);</span>
+</pre></div>
+
+
+<p>You can get JavaPairRDD from input data store:</p>
+<div class="codehilite"><pre><span class="n">JavaPairRDD</span><span
class="o"><</span><span class="n">Long</span><span class="p">,</span> <span
class="n">Pageview</span><span class="o">></span> <span
class="n">goraRDD</span> <span class="p">=</span> <span
class="n">goraSparkEngine</span><span class="p">.</span><span
class="n">initialize</span><span class="p">(</span><span
class="n">sc</span><span class="p">,</span> <span class="n">inStore</span><span
class="p">);</span>
+</pre></div>
+
+
+<p>When you get it, you can work on it as like you are writing a code for
Spark! For example:</p>
+<div class="codehilite"><pre><span class="n">long</span> <span
class="n">count</span> <span class="p">=</span> <span
class="n">goraRDD</span><span class="p">.</span><span
class="n">count</span><span class="p">();</span>
+<span class="n">System</span><span class="p">.</span><span
class="n">out</span><span class="p">.</span><span class="n">println</span><span
class="p">(</span>"<span class="n">Total</span> <span class="n">Log</span>
<span class="n">Count</span><span class="p">:</span> " <span
class="o">+</span> <span class="n">count</span><span class="p">);</span>
+</pre></div>
+
+
+<p>Here are the functions of map and reduce phases for this example:</p>
+<div class="codehilite"><pre><span class="cm">/** The number of milliseconds
in a day */</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="k">final</span> <span class="n">long</span> <span
class="no">DAY_MILIS</span> <span class="o">=</span> <span
class="mh">1000</span> <span class="o">*</span> <span class="mh">60</span>
<span class="o">*</span> <span class="mh">60</span> <span class="o">*</span>
<span class="mh">24</span><span class="p">;</span>
+
+<span class="cm">/**</span>
+<span class="cm">* map function used in calculation</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">Function</span><span class="o"><</span><span
class="n">Pageview</span><span class="p">,</span> <span
class="n">Tuple2</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span><span class="p">,</span> <span
class="n">Long</span><span class="o">>></span> <span
class="n">mapFunc</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">Function</span><span class="o"><</span><span
class="n">Pageview</span><span class="p">,</span> <span
class="n">Tuple2</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span><span class="p">,</span> <span
class="n">Long</span><span class="o">>></span><span class="p">()
</span> <span class="p">{</span>
+ <span class="p">@</span><span class="n">Override</span>
+ <span class="n">public</span> <span class="n">Tuple2</span><span
class="o"><</span><span class="n">Tuple2</span><span
class="o"><</span><span class="n">String</span><span class="p">,</span>
<span class="n">Long</span><span class="o">></span><span class="p">,</span>
<span class="n">Long</span><span class="o">></span> <span
class="n">call</span><span class="p">(</span><span class="n">Pageview</span>
<span class="n">pageview</span><span class="p">)</span> <span
class="n">throws</span> <span class="n">Exception</span> <span
class="p">{</span>
+ <span class="n">String</span> <span class="n">url</span> <span
class="o">=</span> <span class="n">pageview</span><span class="p">.</span><span
class="n">getUrl</span><span class="p">().</span><span
class="n">toString</span><span class="p">();</span>
+ <span class="n">Long</span> <span class="n">day</span> <span
class="o">=</span> <span class="n">getDay</span><span class="p">(</span><span
class="n">pageview</span><span class="p">.</span><span
class="n">getTimestamp</span><span class="p">());</span>
+ <span class="n">Tuple2</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span> <span
class="n">keyTuple</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">Tuple2</span><span class="o"><></span><span
class="p">(</span><span class="n">url</span><span class="p">,</span> <span
class="n">day</span><span class="p">);</span>
+ <span class="k">return</span> <span class="k">new</span> <span
class="n">Tuple2</span><span class="o"><></span><span
class="p">(</span><span class="n">keyTuple</span><span class="p">,</span> <span
class="mh">1</span><span class="no">L</span><span class="p">);</span>
+ <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="cm">/**</span>
+<span class="cm">* reduce function used in calculation</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">Function2</span><span class="o"><</span><span
class="n">Long</span><span class="p">,</span> <span class="n">Long</span><span
class="p">,</span> <span class="n">Long</span><span class="o">></span> <span
class="n">redFunc</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">Function2</span><span class="o"><</span><span
class="n">Long</span><span class="p">,</span> <span class="n">Long</span><span
class="p">,</span> <span class="n">Long</span><span class="o">></span><span
class="p">()</span> <span class="p">{</span>
+ <span class="p">@</span><span class="n">Override</span>
+ <span class="n">public</span> <span class="n">Long</span> <span
class="n">call</span><span class="p">(</span><span class="n">Long</span> <span
class="n">aLong</span><span class="p">,</span> <span class="n">Long</span>
<span class="n">aLong2</span><span class="p">)</span> <span
class="n">throws</span> <span class="n">Exception</span> <span
class="p">{</span>
+ <span class="k">return</span> <span class="n">aLong</span> <span
class="o">+</span> <span class="n">aLong2</span><span class="p">;</span>
+ <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="cm">/**</span>
+<span class="cm">* metric function used after map phase</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">PairFunction</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span><span class="p">,</span> <span
class="n">String</span><span class="p">,</span> <span
class="n">MetricDatum</span><span class="o">></span> <span
class="n">metricFunc</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">PairFunction</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span><span class="p">,</span> <span
class="n">Long</span><span class="o
">></span><span class="p">,</span> <span class="n">String</span><span
class="p">,</span> <span class="n">MetricDatum</span><span
class="o">></span><span class="p">()</span> <span class="p">{</span>
+ <span class="p">@</span><span class="n">Override</span>
+ <span class="n">public</span> <span class="n">Tuple2</span><span
class="o"><</span><span class="n">String</span><span class="p">,</span>
<span class="n">MetricDatum</span><span class="o">></span> <span
class="n">call</span><span class="p">(</span>
+ <span class="n">Tuple2</span><span class="o"><</span><span
class="n">Tuple2</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span><span class="p">,</span> <span
class="n">Long</span><span class="o">></span> <span
class="n">tuple2LongTuple2</span><span class="p">)</span> <span
class="n">throws</span> <span class="n">Exception</span> <span
class="p">{</span>
+ <span class="n">String</span> <span class="n">dimension</span> <span
class="o">=</span> <span class="n">tuple2LongTuple2</span><span
class="p">.</span><span class="n">_1</span><span class="p">().</span><span
class="n">_1</span><span class="p">();</span>
+ <span class="n">long</span> <span class="n">timestamp</span> <span
class="o">=</span> <span class="n">tuple2LongTuple2</span><span
class="p">.</span><span class="n">_1</span><span class="p">().</span><span
class="n">_2</span><span class="p">();</span>
+ <span class="n">MetricDatum</span> <span class="n">metricDatum</span>
<span class="o">=</span> <span class="k">new</span> <span
class="n">MetricDatum</span><span class="p">();</span>
+ <span class="n">metricDatum</span><span class="p">.</span><span
class="n">setMetricDimension</span><span class="p">(</span><span
class="n">dimension</span><span class="p">);</span>
+ <span class="n">metricDatum</span><span class="p">.</span><span
class="n">setTimestamp</span><span class="p">(</span><span
class="n">timestamp</span><span class="p">);</span>
+ <span class="n">String</span> <span class="n">key</span> <span
class="o">=</span> <span class="n">metricDatum</span><span
class="p">.</span><span class="n">getMetricDimension</span><span
class="p">().</span><span class="n">toString</span><span class="p">();</span>
+ <span class="n">key</span> <span class="o">+=</span> <span
class="s">"_"</span> <span class="o">+</span> <span
class="n">Long</span><span class="p">.</span><span
class="n">toString</span><span class="p">(</span><span
class="n">timestamp</span><span class="p">);</span>
+ <span class="n">metricDatum</span><span class="p">.</span><span
class="n">setMetric</span><span class="p">(</span><span
class="n">tuple2LongTuple2</span><span class="p">.</span><span
class="n">_2</span><span class="p">());</span>
+ <span class="k">return</span> <span class="k">new</span> <span
class="n">Tuple2</span><span class="o"><></span><span
class="p">(</span><span class="n">key</span><span class="p">,</span> <span
class="n">metricDatum</span><span class="p">);</span>
+ <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="cm">/**</span>
+<span class="cm">* Rolls up the given timestamp to the day cardinality, so
that data can be aggregated daily</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">long</span> <span class="n">getDay</span><span
class="p">(</span><span class="n">long</span> <span
class="n">timeStamp</span><span class="p">)</span> <span class="p">{</span>
+ <span class="k">return</span> <span class="p">(</span><span
class="n">timeStamp</span> <span class="o">/</span> <span
class="no">DAY_MILIS</span><span class="p">)</span> <span class="o">*</span>
<span class="no">DAY_MILIS</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<p>Here is how to run map and reduce functions at existing JavaPairRDD:</p>
+<div class="codehilite"><pre><span class="n">JavaRDD</span><span
class="o"><</span><span class="n">Tuple2</span><span
class="o"><</span><span class="n">Tuple2</span><span
class="o"><</span><span class="n">String</span><span class="p">,</span>
<span class="n">Long</span><span class="o">></span><span class="p">,</span>
<span class="n">Long</span><span class="o">>></span> <span
class="n">mappedGoraRdd</span> <span class="p">=</span> <span
class="n">goraRDD</span><span class="p">.</span><span
class="n">values</span><span class="p">().</span><span
class="n">map</span><span class="p">(</span><span class="n">mapFunc</span><span
class="p">);</span>
+<span class="n">JavaPairRDD</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">MetricDatum</span><span class="o">></span> <span
class="n">reducedGoraRdd</span> <span class="p">=</span> <span
class="n">JavaPairRDD</span><span class="p">.</span><span
class="n">fromJavaRDD</span><span class="p">(</span><span
class="n">mappedGoraRdd</span><span class="p">).</span><span
class="n">reduceByKey</span><span class="p">(</span><span
class="n">redFunc</span><span class="p">).</span><span
class="n">mapToPair</span><span class="p">(</span><span
class="n">metricFunc</span><span class="p">);</span>
+</pre></div>
+
+
+<p>When you want to persist result into output data store, (in our example it
is Solr), you should do it as follows:</p>
+<div class="codehilite"><pre><span class="n">Configuration</span> <span
class="n">sparkHadoopConf</span> <span class="p">=</span> <span
class="n">goraSparkEngine</span><span class="p">.</span><span
class="n">generateOutputConf</span><span class="p">(</span><span
class="n">outStore</span><span class="p">);</span>
+<span class="n">reducedGoraRdd</span><span class="p">.</span><span
class="n">saveAsNewAPIHadoopDataset</span><span class="p">(</span><span
class="n">sparkHadoopConf</span><span class="p">);</span>
+</pre></div>
+
+
+<p>Thatâs all! You can check Solr to verify the results.</p>
<h2 id="more-examples">More Examples<a class="headerlink"
href="#more-examples" title="Permanent link">¶</a></h2>
<p>Other than this tutorial, there are several places that you can find
examples of Gora in action.</p>