[
https://issues.apache.org/jira/browse/HBASE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717234#comment-14717234
]
Hadoop QA commented on HBASE-14158:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12752802/HBASE-14158.2.patch
against master branch at commit 8f95318f6252c1c0b7a073619525eae6d991f47b.
ATTACHMENT ID: 12752802
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+0 tests included{color}. The patch appears to be a
documentation patch that doesn't require tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 protoc{color}. The applied patch does not increase the
total number of protoc compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 lineLengths{color}. The patch introduces the following lines
longer than 100:
+Apache Spark is a software framework that is used to process data in
memory in a distributed manner, and is replacing MapReduce in many use cases.
+Spark itself is out of scope of this document, please refer to the Spark site
for more information on the Spark project and subprojects. This document will
focus on 4 main interaction points between Spark and HBase. Those interaction
points are:
+2.Spark Streaming: The ability to have a HBase Connection at any point in your
Spark Streaming application.
+4.SparkSQL/DataFrames: The ability to write SparkSQL that draws on tables that
are represented in HBase.
+Here we will talk about Spark HBase integration at the lowest and simplest
levels. All the other interaction points are built upon the concepts that will
be described here.
+At the root of all Spark and HBase integration is the HBaseContext. The
HBaseContext takes in HBase configurations and pushes them to the Spark
executors. This allows us to have an HBase Connection per Spark Executor in a
static location.
+Just for reference Spark Executors can be on the same nodes as the Region
Servers or on different nodes there is no dependence of co-location. Think of
every Spark Executor as a multi-threaded client application.
+Here is a simple example of how the HBaseContext can be used. In this example
we are doing a foreachPartition on a RDD in Scala.
+If Java is perferred instead of Scala it will look a little different but
still vary possible as we can see with this example.
+All functionality between Spark and HBase will be supported both in Scala and
in Java, with the exception of SparkSQL which will support any language that is
supported by Spark. For the remaining of this documentation we will focus on
Scala examples for now.
{color:green}+1 site{color}. The mvn post-site goal succeeds with this patch.
{color:red}-1 core tests{color}. The patch failed these unit tests:
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//testReport/
Release Findbugs (version 2.0.3) warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//artifact/patchprocess/checkstyle-aggregate.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//console
This message is automatically generated.
> Add documentation for Initial Release for HBase-Spark Module integration
> -------------------------------------------------------------------------
>
> Key: HBASE-14158
> URL: https://issues.apache.org/jira/browse/HBASE-14158
> Project: HBase
> Issue Type: Improvement
> Components: documentation, spark
> Reporter: Ted Malaska
> Assignee: Ted Malaska
> Fix For: 2.0.0
>
> Attachments: HBASE-14158.1.patch, HBASE-14158.2.patch
>
>
> Add documentation for Initial Release for HBase-Spark Module integration
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)