Author: buildbot
Date: Wed Oct 29 03:42:33 2014
New Revision: 927240
Log:
Staging update by buildbot for crunch
Modified:
websites/staging/crunch/trunk/content/ (props changed)
websites/staging/crunch/trunk/content/user-guide.html
Propchange: websites/staging/crunch/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Oct 29 03:42:33 2014
@@ -1 +1 @@
-1635034
+1635035
Modified: websites/staging/crunch/trunk/content/user-guide.html
==============================================================================
--- websites/staging/crunch/trunk/content/user-guide.html (original)
+++ websites/staging/crunch/trunk/content/user-guide.html Wed Oct 29 03:42:33
2014
@@ -649,30 +649,29 @@ includes both Avro generic and specific
<p>The <a
href="apidocs/0.10.0/org/apache/crunch/types/avro/Avros.html">Avros</a> class
also has a <code>reflects</code> method for creating PTypes
-for POJOs using Avro's reflection-based serialization mechanism. There are a
couple of restrictions on the structure of
-the POJO:</p>
-<ol>
-<li>It must have a default, no-arg constructor.</li>
-<li>
-<p>All of its fields must be Avro primitive types or collection types that
have Avro equivalents, like <code>ArrayList</code> and
-<code>HashMap<String, T></code>. You may also have arrays of Avro
primitive types.</p>
-<p>// Declare an inline data type and use it for Crunch serialization
-public static class UrlData {
- // The fields don't have to be public, just doing this for the example.
- double curPageRank;
- String[] outboundUrls;</p>
-<p>// Remember: you must have a no-arg constructor.
- public UrlData() { this(0.0, new String[0]); }</p>
-<p>// The regular constructor
- public UrlData(double pageRank, String[] outboundUrls) {
- this.curPageRank = pageRank;
- this.outboundUrls = outboundUrls;
- }
-}</p>
-<p>PType<UrlData> urlDataType = Avros.reflects(UrlData.class);
-PTableType<String, UrlData> pageRankType = Avros.tableOf(Avros.strings(),
urlDataType);</p>
-</li>
-</ol>
+for POJOs using Avro's reflection-based serialization mechanism. There are a
couple of restrictions on the structure of the POJO. First, it must have a
default, no-arg constructor. Second, all of its fields must be Avro primitive
types or collection types that have Avro equivalents, like
<code>ArrayList</code> and <code>HAshMap<String, T></code>. You may also
have arrays of Avro primitive types.
+the POJO.</p>
+<div class="codehilite"><pre><span class="c1">// Declare an inline data type
and use it for Crunch serialization</span>
+<span class="n">public</span> <span class="k">static</span> <span
class="k">class</span> <span class="n">UrlData</span> <span class="p">{</span>
+ <span class="c1">// The fields don't have to be public, just doing this
for the example.</span>
+ <span class="n">double</span> <span class="n">curPageRank</span><span
class="p">;</span>
+ <span class="n">String</span><span class="p">[]</span> <span
class="n">outboundUrls</span><span class="p">;</span>
+
+ <span class="c1">// Remember: you must have a no-arg constructor. </span>
+ <span class="n">public</span> <span class="n">UrlData</span><span
class="p">()</span> <span class="p">{</span> <span class="k">this</span><span
class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span
class="k">new</span> <span class="n">String</span><span class="p">[</span><span
class="mh">0</span><span class="p">]);</span> <span class="p">}</span>
+
+ <span class="c1">// The regular constructor</span>
+ <span class="n">public</span> <span class="n">UrlData</span><span
class="p">(</span><span class="n">double</span> <span
class="n">pageRank</span><span class="p">,</span> <span
class="n">String</span><span class="p">[]</span> <span
class="n">outboundUrls</span><span class="p">)</span> <span class="p">{</span>
+ <span class="k">this</span><span class="p">.</span><span
class="n">curPageRank</span> <span class="o">=</span> <span
class="n">pageRank</span><span class="p">;</span>
+ <span class="k">this</span><span class="p">.</span><span
class="n">outboundUrls</span> <span class="o">=</span> <span
class="n">outboundUrls</span><span class="p">;</span>
+ <span class="p">}</span>
+<span class="p">}</span>
+
+<span class="n">PType</span><span class="o"><</span><span
class="n">UrlData</span><span class="o">></span> <span
class="n">urlDataType</span> <span class="o">=</span> <span
class="n">Avros</span><span class="p">.</span><span
class="n">reflects</span><span class="p">(</span><span
class="n">UrlData</span><span class="p">.</span><span
class="k">class</span><span class="p">);</span>
+<span class="n">PTableType</span><span class="o"><</span><span
class="n">String</span><span class="p">,</span> <span
class="n">UrlData</span><span class="o">></span> <span
class="n">pageRankType</span> <span class="o">=</span> <span
class="n">Avros</span><span class="p">.</span><span
class="n">tableOf</span><span class="p">(</span><span
class="n">Avros</span><span class="p">.</span><span
class="n">strings</span><span class="p">(),</span> <span
class="n">urlDataType</span><span class="p">);</span>
+</pre></div>
+
+
<p>Avro reflection is a great way to define intermediate types for your Crunch
pipelines; not only is your logic clear
and easy to test, but the fact that the data is written out as Avro records
means that you can use tools like Hive and Pig
to query intermediate results to aid in debugging pipeline failures.</p>