Author: kamaci
Date: Sat Oct 24 18:08:54 2015
New Revision: 1710369
URL: http://svn.apache.org/viewvc?rev=1710369&view=rev
Log:
GoraSparkEngine explanation is added.
Modified:
gora/site/trunk/content/current/gora-core.md
gora/site/trunk/content/current/index.md
Modified: gora/site/trunk/content/current/gora-core.md
URL:
http://svn.apache.org/viewvc/gora/site/trunk/content/current/gora-core.md?rev=1710369&r1=1710368&r2=1710369&view=diff
==============================================================================
--- gora/site/trunk/content/current/gora-core.md (original)
+++ gora/site/trunk/content/current/gora-core.md Sat Oct 24 18:08:54 2015
@@ -9,7 +9,7 @@ Every module
in gora depends on gora-core therefore most of the generic documentation
about the project is gathered here as well as the documentation for
<code>AvroStore</code>,
<code>DataFileAvroStore</code> and <code>MemStore</code>. In addition to this,
gora-core holds all of the
-core **MapReduce**, **Persistency**, **Query**, **DataStoreBase** and
**Utility** functionality.
+core **MapReduce**, **GoraSparkEngine**, **Persistency**, **Query**,
**DataStoreBase** and **Utility** functionality.
[TOC]
@@ -122,3 +122,39 @@ MemStore would be configured exactly the
##MemStore XML mappings
In the stores covered within the gora-core module, no physical mappings are
required.
+#GoraSparkEngine
+##Description
+GoraSparkEngine is Spark backend of Apache Gora. Assume that input and output
data stores are:
+
+ DataStore<K1, V1> inStore;
+ DataStore<K2, V2> outStore;
+
+First step of using GoraSparkEngine is to initialize it:
+
+ GoraSparkEngine<K1, V1> goraSparkEngine = new GoraSparkEngine<>(K1.class,
V1.class);
+
+Construct a `JavaSparkContext`. Register input data storeâs value class as
Kryo class:
+
+ SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration
Application").setMaster("local");
+ Class[] c = new Class[1];
+ c[0] = inStore.getPersistentClass();
+ sparkConf.registerKryoClasses(c);
+ JavaSparkContext sc = new JavaSparkContext(sparkConf);
+
+JavaPairRDD can be retrieved from input data store:
+
+ JavaPairRDD<Long, Pageview> goraRDD = goraSparkEngine.initialize(sc,
inStore);
+
+After that, all Spark functionality can be applied. For example running count
can be done as follows:
+
+ long count = goraRDD.count();
+
+Map and Reduce functions can be run on a `JavaPairRDD` as well. Assume that
this is the variable after map/reduce is applied:
+
+ JavaPairRDD<String, MetricDatum> mapReducedGoraRdd;
+
+Result can be written as follows:
+
+ Configuration sparkHadoopConf =
goraSparkEngine.generateOutputConf(outStore);
+ mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);
+
Modified: gora/site/trunk/content/current/index.md
URL:
http://svn.apache.org/viewvc/gora/site/trunk/content/current/index.md?rev=1710369&r1=1710368&r2=1710369&view=diff
==============================================================================
--- gora/site/trunk/content/current/index.md (original)
+++ gora/site/trunk/content/current/index.md Sat Oct 24 18:08:54 2015
@@ -31,7 +31,7 @@ following modules are currently implemen
* [gora-shims-hadoop-1.x](./gora-shims.html): Module enabling us to use Gora
with Hadoop 1.X;
* [gora-shims-hadoop-2.x](./gora-shims.html): Module enabling us to use Gora
with Hadoop 2.X;
* [gora-shims-hadoop-distribution](./gora-shims.html): Packaging container
module enabling easier dependency management whilst working with Gora Shims;
-* [gora-core](./gora-core.html): Module containing core functionality,
AvroStore and DataFileAvroStore stores;
+* [gora-core](./gora-core.html): Module containing core functionality,
AvroStore and DataFileAvroStore stores, GoraSparkEngine;
* [gora-accumulo](./gora-accumulo.html): Module for [Apache
Accumulo](http://accumulo.apache.org) backend and AccumuloStore implementation;
* [camel-gora](./gora-camel.html): An [Apache Camel](http://camel.apache.org/)
component that allows you to work with NoSQL databases using Gora;
* [gora-cassandra](./gora-cassandra.html): Module for [Apache
Cassandra](http://cassandra.apacheorg) backend and CassandraStore
implementation;