Author: buildbot
Date: Mon Dec 14 20:19:25 2015
New Revision: 975540
Log:
Production update by buildbot for camel
Modified:
websites/production/camel/content/apache-spark.html
websites/production/camel/content/cache/main.pageCache
Modified: websites/production/camel/content/apache-spark.html
==============================================================================
--- websites/production/camel/content/apache-spark.html (original)
+++ websites/production/camel/content/apache-spark.html Mon Dec 14 20:19:25 2015
@@ -84,17 +84,17 @@
<tbody>
<tr>
<td valign="top" width="100%">
-<div class="wiki-content maincontent"><h2
id="ApacheSpark-ApacheSparkcomponent">Apache Spark component</h2><div
class="confluence-information-macro
confluence-information-macro-information"><span class="aui-icon aui-icon-small
aui-iconfont-info confluence-information-macro-icon"></span><div
class="confluence-information-macro-body"><p> Apache Spark component is
available starting from Camel
<strong>2.17</strong>.</p></div></div><p> </p><p><span style="line-height:
1.5625;font-size: 16.0px;">This documentation page covers the <a shape="rect"
class="external-link" href="http://spark.apache.org/">Apache Spark</a>
component for the Apache Camel. The main purpose of the Spark integration with
Camel is to provide a bridge between Camel connectors and Spark tasks. In
particular Camel connector provides a way to route message from various
transports, dynamically choose a task to execute, use incoming message as input
data for that task and finally deliver the results of the execut
ion back to the Camel pipeline.</span></p><h3
id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported architectural
styles</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark
component can be used as a driver application deployed into an application
server (or executed as a fat jar).</span></p><p><span style="line-height:
1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper
confluence-embedded-manual-size"><img class="confluence-embedded-image"
height="250" src="apache-spark.data/camel_spark_driver.png"
data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&modificationDate=1449478362000&api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331563"
data-linked-resource-version="2" data-linked-resource-type="attachment"
data-linked-resource-default-alias="camel_spark_driver.png"
data-base-url="https://cwiki.apache.org/confluence"
data-linked-resource-content-type="image/png" data
-linked-resource-container-id="61331559"
data-linked-resource-container-version="13"></span><br
clear="none"></span></p><p><span style="line-height: 1.5625;font-size:
16.0px;">Spark component can also be submitted as a job directly into the Spark
cluster.</span></p><p><span style="line-height: 1.5625;font-size:
16.0px;"><span class="confluence-embedded-file-wrapper
confluence-embedded-manual-size"><img class="confluence-embedded-image"
height="250" src="apache-spark.data/camel_spark_cluster.png"
data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&modificationDate=1449478393000&api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331565"
data-linked-resource-version="1" data-linked-resource-type="attachment"
data-linked-resource-default-alias="camel_spark_cluster.png"
data-base-url="https://cwiki.apache.org/confluence"
data-linked-resource-content-type="image/png"
data-linked-resource-container-id="61331559" data-linked-
resource-container-version="13"></span><br clear="none"></span></p><p><span
style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary
designed to work as a <em>long running job</em> serving as an bridge
between Spark cluster and the other endpoints, you can also use it as a
<em>fire-once</em> short job.  </span> </p><h3
id="ApacheSpark-RunningSparkinOSGiservers"><span>Running Spark in OSGi
servers</span></h3><p>Currently the Spark component doesn't support execution
in the OSGi container. Spark has been designed to be executed as a fat jar,
usually submitted as a job to a cluster. For those reasons running Spark in an
OSGi server is at least challenging and is not support by Camel as well.</p><h3
id="ApacheSpark-URIformat">URI format</h3><p>Currently the Spark component
supports only producers - it it intended to invoke a Spark job and return
results. You can call RDD, data frame or Hive SQL
job.</p><div><p> </p><div class="code panel pdl" s
tyle="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Spark URI format</b></div><div
class="codeContent panelContent pdl">
+<div class="wiki-content maincontent"><h2
id="ApacheSpark-ApacheSparkcomponent">Apache Spark component</h2><div
class="confluence-information-macro
confluence-information-macro-information"><span class="aui-icon aui-icon-small
aui-iconfont-info confluence-information-macro-icon"></span><div
class="confluence-information-macro-body"><p> Apache Spark component is
available starting from Camel
<strong>2.17</strong>.</p></div></div><p> </p><p><span style="line-height:
1.5625;font-size: 16.0px;">This documentation page covers the <a shape="rect"
class="external-link" href="http://spark.apache.org/">Apache Spark</a>
component for the Apache Camel. The main purpose of the Spark integration with
Camel is to provide a bridge between Camel connectors and Spark tasks. In
particular Camel connector provides a way to route message from various
transports, dynamically choose a task to execute, use incoming message as input
data for that task and finally deliver the results of the execut
ion back to the Camel pipeline.</span></p><h3
id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported architectural
styles</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark
component can be used as a driver application deployed into an application
server (or executed as a fat jar).</span></p><p><span style="line-height:
1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper
confluence-embedded-manual-size"><img class="confluence-embedded-image"
height="250" src="apache-spark.data/camel_spark_driver.png"
data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&modificationDate=1449478362000&api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331563"
data-linked-resource-version="2" data-linked-resource-type="attachment"
data-linked-resource-default-alias="camel_spark_driver.png"
data-base-url="https://cwiki.apache.org/confluence"
data-linked-resource-content-type="image/png" data
-linked-resource-container-id="61331559"
data-linked-resource-container-version="14"></span><br
clear="none"></span></p><p><span style="line-height: 1.5625;font-size:
16.0px;">Spark component can also be submitted as a job directly into the Spark
cluster.</span></p><p><span style="line-height: 1.5625;font-size:
16.0px;"><span class="confluence-embedded-file-wrapper
confluence-embedded-manual-size"><img class="confluence-embedded-image"
height="250" src="apache-spark.data/camel_spark_cluster.png"
data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&modificationDate=1449478393000&api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331565"
data-linked-resource-version="1" data-linked-resource-type="attachment"
data-linked-resource-default-alias="camel_spark_cluster.png"
data-base-url="https://cwiki.apache.org/confluence"
data-linked-resource-content-type="image/png"
data-linked-resource-container-id="61331559" data-linked-
resource-container-version="14"></span><br clear="none"></span></p><p><span
style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary
designed to work as a <em>long running job</em> serving as an bridge
between Spark cluster and the other endpoints, you can also use it as a
<em>fire-once</em> short job.  </span> </p><h3
id="ApacheSpark-RunningSparkinOSGiservers"><span>Running Spark in OSGi
servers</span></h3><p>Currently the Spark component doesn't support execution
in the OSGi container. Spark has been designed to be executed as a fat jar,
usually submitted as a job to a cluster. For those reasons running Spark in an
OSGi server is at least challenging and is not support by Camel as well.</p><h3
id="ApacheSpark-URIformat">URI format</h3><p>Currently the Spark component
supports only producers - it it intended to invoke a Spark job and return
results. You can call RDD, data frame or Hive SQL
job.</p><div><p> </p><div class="code panel pdl" s
tyle="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Spark URI format</b></div><div
class="codeContent panelContent pdl">
<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[spark:{rdd|dataframe|hive}]]></script>
</div></div><p> </p></div><h3 id="ApacheSpark-RDDjobs">RDD
jobs </h3><p> </p><div>To invoke an RDD job, use the following
URI:</div><div class="code panel pdl" style="border-width: 1px;"><div
class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark
RDD producer</b></div><div class="codeContent panelContent pdl">
<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[spark:rdd?rdd=#testFileRdd&rddCallback=#transformation]]></script>
-</div></div><p> Where <code>rdd</code> option refers to the name of an
RDD instance (subclass of
<code>org.apache.spark.api.java.AbstractJavaRDDLike</code>) from a Camel
registry, while <code>rddCallback</code> refers to the implementation
of <code>org.apache.camel.component.spark.RddCallback</code> class (also
from a registry). RDD callback provides a single method used to apply incoming
messages against the given RDD. Results of callback computations are saved as a
body to an exchange.</p><div class="code panel pdl" style="border-width:
1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width:
1px;"><b>Spark RDD callback</b></div><div class="codeContent panelContent pdl">
+</div></div><p> Where <code>rdd</code> option refers to the name of an
RDD instance (subclass of
<code>org.apache.spark.api.java.AbstractJavaRDDLike</code>) from a Camel
registry, while <code>rddCallback</code> refers to the implementation
of <code>org.apache.camel.component.spark.RddCallback</code> interface
(also from a registry). RDD callback provides a single method used to apply
incoming messages against the given RDD. Results of callback computations are
saved as a body to an exchange.</p><div class="code panel pdl"
style="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div
class="codeContent panelContent pdl">
<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[public interface RddCallback<T> {
T onRdd(AbstractJavaRDDLike rdd, Object... payloads);
}]]></script>
</div></div><p>The following snippet demonstrates how to send message as an
input to the job and return results:</p><div class="code panel pdl"
style="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Calling spark job</b></div><div
class="codeContent panelContent pdl">
<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[String pattern = "job input";
-long linesCount =
producerTemplate.requestBody("spark:rdd?myRdd=#testFileRdd&rddCallback=#countLinesContaining",
pattern, long.class);]]></script>
+long linesCount =
producerTemplate.requestBody("spark:rdd?rdd=#myRdd&rddCallback=#countLinesContaining",
pattern, long.class);]]></script>
</div></div><p>The RDD callback for the snippet above registered as Spring
bean could look as follows:</p><div class="code panel pdl" style="border-width:
1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width:
1px;"><b>Spark RDD callback</b></div><div class="codeContent panelContent pdl">
<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[@Bean
RddCallback<Long> countLinesContaining() {
@@ -177,7 +177,34 @@ public class MyTransformation {
Â
// Convert String "10" to integer
long result =
producerTemplate.requestBody("spark:rdd?rdd=#rdd&rddCallback=#rddCallback"
Arrays.asList(10, "10"), long.class);]]></script>
-</div></div><p></p><h3 id="ApacheSpark-SeeAlso">See Also</h3>
+</div></div><p> </p><h3 id="ApacheSpark-DataFramejobs">DataFrame
jobs</h3><p> </p><p>Instead of working with RDDs Spark component can work
with DataFrames as well. </p><div>To invoke an DataFrame job, use the
following URI:</div><div class="code panel pdl" style="border-width: 1px;"><div
class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark
RDD producer</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[spark:dataframe?dataFrame=#testDataFrame&dataFrameCallback=#transformation]]></script>
+</div></div><p> Where <code>dataFrame</code> option refers to
the name of an DataFrame instance (<code>instance of
of org.apache.spark.<span style="line-height: 1.42857;">sql</span><code
style="line-height: 1.42857;">.DataFrame</code></code>) from a Camel registry,
while <code style="line-height:
1.42857;">dataFrameCallback</code> refers to the implementation
of <code style="line-height:
1.42857;">org.apache.camel.component.spark.DataFrameCallback</code> interface
(also from a registry). DataFrame callback provides a single method used to
apply incoming messages against the given DataFrame. Results of callback
computations are saved as a body to an exchange.</p><div class="code panel pdl"
style="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div
class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[public interface DataFrameCallback<T> {
+ T onDataFrame(DataFrame dataFrame, Object... payloads);
+}]]></script>
+</div></div><p>The following snippet demonstrates how to send message as an
input to a job and return results:</p><div class="code panel pdl"
style="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Calling spark job</b></div><div
class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[String model = "Micra";
+long linesCount =
producerTemplate.requestBody("spark:dataFrame?dataFrame=#cars&dataFrameCallback=#findCarWithModel",
model, long.class);]]></script>
+</div></div><p>The DataFrame callback for the snippet above registered as
Spring bean could look as follows:</p><div class="code panel pdl"
style="border-width: 1px;"><div class="codeHeader panelHeader pdl"
style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div
class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[@Bean
+RddCallback<Long> findCarWithModel() {
+ return new DataFrameCallback<Long>() {
+ @Override
+ public Long onDataFrame(DataFrame dataFrame, Object... payloads) {
+ String model = (String) payloads[0];
+ return
dataFrame.where(dataFrame.col("model").eqNullSafe(model)).count();
+ }
+ };
+}]]></script>
+</div></div><p>The DataFrame definition in Spring could looks as
follows:</p><div class="code panel pdl" style="border-width: 1px;"><div
class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark
RDD definition</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default"
type="syntaxhighlighter"><![CDATA[@Bean
+DataFrame cars(HiveContext hiveContext) {
+ DataFrame jsonCars =
hiveContext.read().json("/var/data/cars.json");
+ jsonCars.registerTempTable("cars");
+ return jsonCars;
+}]]></script>
+</div></div><p> </p><h4 id="ApacheSpark-DataFramejobsoptions">DataFrame
jobs options</h4><div class="table-wrap"><table
class="confluenceTable"><tbody><tr><th colspan="1" rowspan="1"
class="confluenceTh">Option</th><th colspan="1" rowspan="1"
class="confluenceTh">Description</th><th colspan="1" rowspan="1"
class="confluenceTh">Default value</th></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><code>dataFrame</code></td><td colspan="1" rowspan="1"
class="confluenceTd">DataFrame instance (subclass
of <code><span>org.apache.spark.</span><span>sql</span>.DataFrame</code>).</td><td
colspan="1" rowspan="1"
class="confluenceTd"><code>null</code></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><code>dataFrameCallback</code></td><td colspan="1"
rowspan="1" class="confluenceTd">Instance
of <code>org.apache.camel.component.spark.DataFrameCallback</code> interface.</td><td
colspan="1" rowspan="1" class="confluenceTd"><code><span style="color:
rgb(0,51,
102);">null
</span></code></td></tr></tbody></table></div><p> </p><p></p><h3
id="ApacheSpark-SeeAlso">See Also</h3>
<ul><li><a shape="rect" href="configuring-camel.html">Configuring
Camel</a></li><li><a shape="rect"
href="component.html">Component</a></li><li><a shape="rect"
href="endpoint.html">Endpoint</a></li><li><a shape="rect"
href="getting-started.html">Getting Started</a></li></ul></div>
</td>
<td valign="top">
Modified: websites/production/camel/content/cache/main.pageCache
==============================================================================
Binary files - no diff available.