Author: kamaci
Date: Sat Oct 24 18:08:54 2015
New Revision: 1710369

URL: http://svn.apache.org/viewvc?rev=1710369&view=rev
Log:
GoraSparkEngine explanation is added.

Modified:
    gora/site/trunk/content/current/gora-core.md
    gora/site/trunk/content/current/index.md

Modified: gora/site/trunk/content/current/gora-core.md
URL: 
http://svn.apache.org/viewvc/gora/site/trunk/content/current/gora-core.md?rev=1710369&r1=1710368&r2=1710369&view=diff
==============================================================================
--- gora/site/trunk/content/current/gora-core.md (original)
+++ gora/site/trunk/content/current/gora-core.md Sat Oct 24 18:08:54 2015
@@ -9,7 +9,7 @@ Every module
 in gora depends on gora-core therefore most of the generic documentation 
 about the project is gathered here as well as the documentation for 
<code>AvroStore</code>, 
 <code>DataFileAvroStore</code> and <code>MemStore</code>. In addition to this, 
gora-core holds all of the 
-core **MapReduce**, **Persistency**, **Query**, **DataStoreBase** and 
**Utility** functionality.
+core **MapReduce**, **GoraSparkEngine**, **Persistency**, **Query**, 
**DataStoreBase** and **Utility** functionality.
 
 [TOC]
 
@@ -122,3 +122,39 @@ MemStore would be configured exactly the
 ##MemStore XML mappings
 In the stores covered within the gora-core module, no physical mappings are 
required.
 
+#GoraSparkEngine
+##Description
+GoraSparkEngine is Spark backend of Apache Gora. Assume that input and output 
data stores are:
+
+    DataStore<K1, V1> inStore;
+    DataStore<K2, V2> outStore;
+
+First step of using GoraSparkEngine is to initialize it:
+
+    GoraSparkEngine<K1, V1> goraSparkEngine = new GoraSparkEngine<>(K1.class, 
V1.class);
+
+Construct a `JavaSparkContext`. Register input data store’s value class as 
Kryo class:
+
+    SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration 
Application").setMaster("local");
+    Class[] c = new Class[1];
+    c[0] = inStore.getPersistentClass();
+    sparkConf.registerKryoClasses(c);
+    JavaSparkContext sc = new JavaSparkContext(sparkConf);
+
+JavaPairRDD can be retrieved from input data store:
+
+    JavaPairRDD<Long, Pageview> goraRDD = goraSparkEngine.initialize(sc, 
inStore);
+
+After that, all Spark functionality can be applied. For example running count 
can be done as follows:
+
+    long count = goraRDD.count();
+
+Map and Reduce functions can be run on a `JavaPairRDD` as well. Assume that 
this is the variable after map/reduce is applied:
+
+    JavaPairRDD<String, MetricDatum> mapReducedGoraRdd;
+
+Result can be written as follows:
+
+    Configuration sparkHadoopConf = 
goraSparkEngine.generateOutputConf(outStore);
+    mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);
+

Modified: gora/site/trunk/content/current/index.md
URL: 
http://svn.apache.org/viewvc/gora/site/trunk/content/current/index.md?rev=1710369&r1=1710368&r2=1710369&view=diff
==============================================================================
--- gora/site/trunk/content/current/index.md (original)
+++ gora/site/trunk/content/current/index.md Sat Oct 24 18:08:54 2015
@@ -31,7 +31,7 @@ following modules are currently implemen
 * [gora-shims-hadoop-1.x](./gora-shims.html): Module enabling us to use Gora 
with Hadoop 1.X;
 * [gora-shims-hadoop-2.x](./gora-shims.html): Module enabling us to use Gora 
with Hadoop 2.X;
 * [gora-shims-hadoop-distribution](./gora-shims.html): Packaging container 
module enabling easier dependency management whilst working with Gora Shims;
-* [gora-core](./gora-core.html): Module containing core functionality, 
AvroStore and DataFileAvroStore stores;
+* [gora-core](./gora-core.html): Module containing core functionality, 
AvroStore and DataFileAvroStore stores, GoraSparkEngine;
 * [gora-accumulo](./gora-accumulo.html): Module for [Apache 
Accumulo](http://accumulo.apache.org) backend and AccumuloStore implementation;
 * [camel-gora](./gora-camel.html): An [Apache Camel](http://camel.apache.org/) 
component that allows you to work with NoSQL databases using Gora;
 * [gora-cassandra](./gora-cassandra.html): Module for [Apache 
Cassandra](http://cassandra.apacheorg) backend and CassandraStore 
implementation;


Reply via email to