Author: buildbot
Date: Mon Sep 29 01:07:56 2014
New Revision: 923977
Log:
Staging update by buildbot for gora
Modified:
websites/staging/gora/trunk/content/ (props changed)
websites/staging/gora/trunk/content/current/index.html
Propchange: websites/staging/gora/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Sep 29 01:07:56 2014
@@ -1 +1 @@
-1628110
+1628111
Modified: websites/staging/gora/trunk/content/current/index.html
==============================================================================
--- websites/staging/gora/trunk/content/current/index.html (original)
+++ websites/staging/gora/trunk/content/current/index.html Mon Sep 29 01:07:56
2014
@@ -276,25 +276,31 @@ detected.</p>
<h4 id="building-goraci">Building GoraCI</h4>
<p>As GoraCI is packaged with the Gora master branch source it is
automatically
built every time you execute</p>
-<p><code>mvn install</code></p>
+<div class="codehilite"><pre><span class="n">mvn</span> <span
class="n">install</span>
+</pre></div>
+
+
<p>The maven pom file has some profiles that attempt to make it easier to run
GoraCI against different Gora backends by copying the jars you need into
<code>lib</code>.
Before packaging its important to edit <code>gora.properties</code> and set it
correctly
for your datastore. To run against Accumulo do the following.</p>
-<p><code>
- vim src/main/resources/gora.properties //set Accumulo properties</p>
-<p>mvn package -Paccumulo-1.4
-</code></p>
+<div class="codehilite"><pre><span class="n">vim</span> <span
class="n">src</span><span class="o">/</span><span class="n">main</span><span
class="o">/</span><span class="n">resources</span><span class="o">/</span><span
class="n">gora</span><span class="p">.</span><span class="k">properties</span>
<span class="o">//</span><span class="n">set</span> <span
class="n">Accumulo</span> <span class="k">properties</span>
+<span class="n">mvn</span> <span class="n">package</span> <span
class="o">-</span><span class="n">Paccumulo</span><span
class="o">-</span>1<span class="p">.</span>4
+</pre></div>
+
+
<p>To run against HBase, do the following.</p>
-<p><code>
- vim src/main/resources/gora.properties //set HBase properties</p>
-<p>mvn package -Phbase-0.92
-</code></p>
+<div class="codehilite"><pre><span class="n">vim</span> <span
class="n">src</span><span class="o">/</span><span class="n">main</span><span
class="o">/</span><span class="n">resources</span><span class="o">/</span><span
class="n">gora</span><span class="p">.</span><span class="k">properties</span>
<span class="o">//</span><span class="n">set</span> <span
class="n">HBase</span> <span class="k">properties</span>
+<span class="n">mvn</span> <span class="n">package</span> <span
class="o">-</span><span class="n">Phbase</span><span class="o">-</span>0<span
class="p">.</span>92
+</pre></div>
+
+
<p>To run against Cassandra, do the following.</p>
-<p><code>
- vim src/main/resources/gora.properties //set Cassandra properties</p>
-<p>mvn package -Pcassandra-1.1.2
-</code></p>
+<div class="codehilite"><pre><span class="n">vim</span> <span
class="n">src</span><span class="o">/</span><span class="n">main</span><span
class="o">/</span><span class="n">resources</span><span class="o">/</span><span
class="n">gora</span><span class="p">.</span><span class="k">properties</span>
<span class="o">//</span><span class="n">set</span> <span
class="n">Cassandra</span> <span class="k">properties</span>
+<span class="n">mvn</span> <span class="n">package</span> <span
class="o">-</span><span class="n">Pcassandra</span><span
class="o">-</span>1<span class="p">.</span>1<span class="p">.</span>2
+</pre></div>
+
+
<p>For other datastores mentioned in <code>gora.properties</code>, you will
need to copy the
appropriate deps into <code>lib</code>. Feel free to update the pom with
other profiles, <a href="https://issues.apache.org/jira/browse/GORA/">open
a ticket</a> or just <a href="https://github.com/apache/gora/">send us a pull
request</a>.</p>
@@ -316,10 +322,11 @@ a ticket</a> or just <a href="https://gi
<p><a
href="https://github.com/apache/gora/blob/master/gora-goraci/goraci.sh">goraci.sh</a>
is a helper script that you can use to run the above programs. It
assumes all needed jars are in the <code>lib</code> dir. It does not need the
package name.
You can just run <code>goraci.sh Generator</code>, below is an example.</p>
-<p><code>
- $ ./goraci.sh Generator</p>
-<p>Usage : Generator <num mappers> <num nodes>
-</code></p>
+<div class="codehilite"><pre>$ <span class="o">./</span><span
class="n">goraci</span><span class="p">.</span><span class="n">sh</span> <span
class="n">Generator</span>
+<span class="n">Usage</span> <span class="p">:</span> <span
class="n">Generator</span> <span class="o"><</span><span
class="n">num</span> <span class="n">mappers</span><span class="o">></span>
<span class="o"><</span><span class="n">num</span> <span
class="n">nodes</span><span class="o">></span>
+</pre></div>
+
+
<p>For Gora to work, it needs a <code>gora.properties</code> file on the
classpath and a
<code>gora-$datastore-mapping.xml</code> mapping file on the classpath, the
contents of both are datastore specific,
more details can be found here [2]. You can edit the ones in src/main/resources
@@ -334,35 +341,38 @@ jackson-core-asl-1.4.2.jar and jackson-m
<h4 id="goraci-and-hbase">GoraCI and HBase</h4>
<p>To improve performance running read jobs such as the Verify step, enable
scanner caching on the command line. For example:</p>
-<p><code>
- $ ./gorachi.sh Verify -Dhbase.client.scanner.caching=1000 \
- -Dmapred.map.tasks.speculative.execution=false verify_dir 1000
-</code></p>
+<div class="codehilite"><pre>$ <span class="o">./</span><span
class="n">gorachi</span><span class="p">.</span><span class="n">sh</span> <span
class="n">Verify</span> <span class="o">-</span><span
class="n">Dhbase</span><span class="p">.</span><span
class="n">client</span><span class="p">.</span><span
class="n">scanner</span><span class="p">.</span><span
class="n">caching</span><span class="p">=</span>1000 <span class="o">\</span>
+ <span class="o">-</span><span class="n">Dmapred</span><span
class="p">.</span><span class="n">map</span><span class="p">.</span><span
class="n">tasks</span><span class="p">.</span><span
class="n">speculative</span><span class="p">.</span><span
class="n">execution</span><span class="p">=</span><span class="n">false</span>
<span class="n">verify_dir</span> 1000
+</pre></div>
+
+
<p>Dependent on how you have your Hadoop and HBase setup deployed, you may
need to
change the <code>gorachi.sh</code> script around some. Here is one suggestion
that may help
in the case where your Hadoop and HBase configuration are other than under the
Hadoop and HBase home directories.</p>
-<p><code>
- diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
- index db1562a..31c3c94 100755
- --- a/org.apache.gora.goraci.sh
- +++ b/org.apache.gora.goraci.sh
- @@ -95,6 +95,4 @@ done
- #run it
- export HADOOP_CLASSPATH="$CLASSPATH"
- LIBJARS=<code>echo $HADOOP_CLASSPATH | tr : ,</code>
- -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar"
$CLASS -libjars "$LIBJARS" "$@"
- -
- -
- +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar
"$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files
"${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"
-</code></p>
+<div class="codehilite"><pre><span class="gh">diff --git
a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh</span>
+<span class="gh">index db1562a..31c3c94 100755</span>
+<span class="gd">--- a/org.apache.gora.goraci.sh</span>
+<span class="gi">+++ b/org.apache.gora.goraci.sh</span>
+<span class="gu">@@ -95,6 +95,4 @@ done</span>
+ #run it
+ export HADOOP_CLASSPATH="$CLASSPATH"
+ LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
+ -hadoop jar
"$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS
-libjars "$LIBJARS" "$@"
+ -
+ -
+ +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config
"${HADOOP_CONF_DIR} jar
"$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS
-files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars
"$LIBJARS" "$@"
+</pre></div>
+
+
<p>You will need to define <code>HBASE_CONF_DIR</code> and
</code>HADOOP_CONF_DIR</code> before you run your
<strong>goraci</strong> jobs. For example:</p>
-<p><code>
- $ export HADOOP_CONF_DIR=/home/you/hadoop-conf</p>
-<p>$ export HBASE_CONF_DIR=/home/you/hbase-conf</p>
-<p>$ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000
-</code></p>
+<div class="codehilite"><pre>$ <span class="n">export</span> <span
class="n">HADOOP_CONF_DIR</span><span class="p">=</span><span
class="o">/</span><span class="n">home</span><span class="o">/</span><span
class="n">you</span><span class="o">/</span><span class="n">hadoop</span><span
class="o">-</span><span class="n">conf</span>
+$ <span class="n">export</span> <span class="n">HBASE_CONF_DIR</span><span
class="p">=</span><span class="o">/</span><span class="n">home</span><span
class="o">/</span><span class="n">you</span><span class="o">/</span><span
class="n">hbase</span><span class="o">-</span><span class="n">conf</span>
+$ <span class="n">PATH</span><span class="p">=</span><span
class="o">/</span><span class="n">home</span><span class="o">/</span><span
class="n">you</span><span class="o">/</span><span class="n">hadoop</span><span
class="o">-</span>1<span class="p">.</span>0<span class="p">.</span>2<span
class="o">/</span><span class="n">bin</span><span class="p">:</span>$<span
class="n">PATH</span> <span class="o">./</span><span
class="n">goraci</span><span class="p">.</span><span class="n">sh</span> <span
class="n">Generator</span> 1000 1000000
+</pre></div>
+
+
<h4 id="concurrency">Concurrency</h4>
<p>Its possible to run verification at the same time as generation. To do this
supply the -c option to Generator and Verify. This will cause Genertor to
@@ -385,39 +395,42 @@ are useful for assesing performance.</p>
<p>Below shows running a test of the test. Ingest one linked list, deleted a
node
in it, ensure the verifaction map reduce job notices that the node is missing.
Not all output is shown, just the important parts.</p>
-<p><code>
- $ ./org.apache.gora.goraci.sh Generator 1 25000000</p>
-<p>$ ./org.apache.gora.goraci.sh Print -s 2000000000000000 -l 1</p>
-<p>2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6</p>
-<p>$ ./org.apache.gora.goraci.sh Print -s 30350f9ae6f6e8f7 -l 1</p>
-<p>30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6</p>
-<p>$ ./org.apache.gora.goraci.sh Delete 30350f9ae6f6e8f7</p>
-<p>Delete returned true</p>
-<p>$ ./org.apache.gora.goraci.sh Verify gci_verify_1 2 </p>
-<p>11/12/20 17:12:31 INFO mapred.JobClient:
org.apache.gora.goraci.Verify$Counts</p>
-<p>11/12/20 17:12:31 INFO mapred.JobClient: UNDEFINED=1</p>
-<p>11/12/20 17:12:31 INFO mapred.JobClient: REFERENCED=24999998</p>
-<p>11/12/20 17:12:31 INFO mapred.JobClient: UNREFERENCED=1</p>
-<p>$ hadoop fs -cat gci_verify_1/part* 30350f9ae6f6e8f7 2000001f65dbd238
-</code></p>
+<div class="codehilite"><pre>$ <span class="o">./</span><span
class="n">goraci</span><span class="p">.</span><span class="n">sh</span> <span
class="n">Generator</span> 1 25000000
+$ <span class="o">./</span><span class="n">goraci</span><span
class="p">.</span><span class="n">sh</span> <span class="n">Print</span> <span
class="o">-</span><span class="n">s</span> 2000000000000000 <span
class="o">-</span><span class="n">l</span> 1
+ 2000001<span class="n">f65dbd238</span><span class="p">:</span>30350<span
class="n">f9ae6f6e8f7</span><span class="p">:</span>000004265852<span
class="p">:</span><span class="n">ef09f9dd</span><span
class="o">-</span>75<span class="n">b1</span><span class="o">-</span>4<span
class="n">c16</span><span class="o">-</span>9<span class="n">f14</span><span
class="o">-</span>0<span class="n">fa84f3029b6</span>
+$ <span class="o">./</span><span class="n">goraci</span><span
class="p">.</span><span class="n">sh</span> <span class="n">Print</span> <span
class="o">-</span><span class="n">s</span> 30350<span
class="n">f9ae6f6e8f7</span> <span class="o">-</span><span class="n">l</span> 1
+ 30350<span class="n">f9ae6f6e8f7</span><span class="p">:</span>4867<span
class="n">fe03de6ea6c8</span><span class="p">:</span>000003265852<span
class="p">:</span><span class="n">ef09f9dd</span><span
class="o">-</span>75<span class="n">b1</span><span class="o">-</span>4<span
class="n">c16</span><span class="o">-</span>9<span class="n">f14</span><span
class="o">-</span>0<span class="n">fa84f3029b6</span>
+$ <span class="o">./</span><span class="n">goraci</span><span
class="p">.</span><span class="n">sh</span> <span class="n">Delete</span>
30350<span class="n">f9ae6f6e8f7</span>
+ <span class="n">Delete</span> <span class="n">returned</span> <span
class="n">true</span>
+$ <span class="o">./</span><span class="n">goraci</span><span
class="p">.</span><span class="n">sh</span> <span class="n">Verify</span> <span
class="n">gci_verify_1</span> 2
+ 11<span class="o">/</span>12<span class="o">/</span>20 17<span
class="p">:</span>12<span class="p">:</span>31 <span class="n">INFO</span>
<span class="n">mapred</span><span class="p">.</span><span
class="n">JobClient</span><span class="p">:</span> <span
class="n">org</span><span class="p">.</span><span class="n">apache</span><span
class="p">.</span><span class="n">gora</span><span class="p">.</span><span
class="n">goraci</span><span class="p">.</span><span
class="n">Verify</span>$<span class="n">Counts</span>
+ 11<span class="o">/</span>12<span class="o">/</span>20 17<span
class="p">:</span>12<span class="p">:</span>31 <span class="n">INFO</span>
<span class="n">mapred</span><span class="p">.</span><span
class="n">JobClient</span><span class="p">:</span> <span
class="n">UNDEFINED</span><span class="p">=</span>1
+ 11<span class="o">/</span>12<span class="o">/</span>20 17<span
class="p">:</span>12<span class="p">:</span>31 <span class="n">INFO</span>
<span class="n">mapred</span><span class="p">.</span><span
class="n">JobClient</span><span class="p">:</span> <span
class="n">REFERENCED</span><span class="p">=</span>24999998
+ 11<span class="o">/</span>12<span class="o">/</span>20 17<span
class="p">:</span>12<span class="p">:</span>31 <span class="n">INFO</span>
<span class="n">mapred</span><span class="p">.</span><span
class="n">JobClient</span><span class="p">:</span> <span
class="n">UNREFERENCED</span><span class="p">=</span>1
+$ <span class="n">hadoop</span> <span class="n">fs</span> <span
class="o">-</span><span class="nb">cat</span> <span
class="n">gci_verify_1</span><span class="o">/</span><span
class="n">part</span><span class="o">\*</span> 30350<span
class="n">f9ae6f6e8f7</span> 2000001<span class="n">f65dbd238</span>
+</pre></div>
+
+
<p>The map reduce job found the one undefined node and gave the node that
referenced it.</p>
<p>Below are some timing statistics for running Goraci on a 10 node cluster.
</p>
-<p><code>
- Store | Task | Time | Undef | Unref | Ref
<br />
-
----------------+------------------------+---------+--------+-------+------------
- accumulo-1.4.0 | Generator 10 100000000 | 40m 16s | N/A | N/A |
N/A <br />
- accumulo-1.4.0 | Verify /tmp/goraci1 40 | 6m 7s | 0 | 0 |
1000000000<br />
- hbase-0.92.1 | Generator 10 100000000 | 2h 44m | N/A | N/A |
N/A <br />
- hbase-0.92.1 | Verify /tmp/goraci2 40 | 6m 34s | 0 | 0 |
1000000000
-</code></p>
+<div class="codehilite"><pre>Store | Task | Time
| Undef | Unref | Ref
+----------------+------------------------+---------+--------+-------+------------
+accumulo-1.4.0 | Generator 10 100000000 | 40m 16s | N/A | N/A |
N/A
+accumulo-1.4.0 | Verify /tmp/goraci1 40 | 6m 7s | 0 | 0 |
1000000000
+hbase-0.92.1 | Generator 10 100000000 | 2h 44m | N/A | N/A |
N/A
+hbase-0.92.1 | Verify /tmp/goraci2 40 | 6m 34s | 0 | 0 |
1000000000
+</pre></div>
+
+
<p>HBase and Accumulo are configured differently out-of-the-box. We used the
Accumulo
3G, native configuration examples in the <a
href="https://github.com/apache/gora/tree/master/gora-goraci/src/main/resources">conf/examples</a>
directory.</p>
<p>To provide a comparable memory footprint, we increased the HBase jvm to
"-Xmx4000m",
and turned on compression for the ci table:</p>
-<p><code>
-create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'}
-</code></p>
+<div class="codehilite"><pre><span class="n">create</span> <span
class="s">'ci'</span><span class="p">,</span> <span
class="p">{</span><span class="n">NAME</span><span class="p">=</span><span
class="o">></span><span class="s">'meta'</span><span
class="p">,</span> <span class="n">COMPRESSION</span><span
class="p">=</span><span class="o">></span><span
class="s">'GZ'</span><span class="p">}</span>
+</pre></div>
+
+
<p>We also turned down the replication of write-ahead logs to be comparable to
Accumulo:</p>
<div class="codehilite"><pre><span class="nt"><property></span>
<span class="nt"><name></span>hbase.regionserver.hlog.replication<span
class="nt"></name></span>