[jira] [Commented] (HBASE-14158) Add documentation for Initial Release for HBase-Spark Module integration

Hadoop QA (JIRA) Thu, 27 Aug 2015 11:28:20 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717234#comment-14717234
 ]


Hadoop QA commented on HBASE-14158:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12752802/HBASE-14158.2.patch
  against master branch at commit 8f95318f6252c1c0b7a073619525eae6d991f47b.
  ATTACHMENT ID: 12752802

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
    +Apache Spark is a software framework that is used to process data in 
memory in a distributed manner, and is replacing MapReduce in many use cases.  
+Spark itself is out of scope of this document, please refer to the Spark site 
for more information on the Spark project and subprojects.  This document will 
focus on 4 main interaction points between Spark and HBase.  Those interaction 
points are:
+2.Spark Streaming: The ability to have a HBase Connection at any point in your 
Spark Streaming application.
+4.SparkSQL/DataFrames: The ability to write SparkSQL that draws on tables that 
are represented in HBase.  
+Here we will talk about Spark HBase integration at the lowest and simplest 
levels.  All the other interaction points are built upon the concepts that will 
be described here.  
+At the root of all Spark and HBase integration is the HBaseContext.  The 
HBaseContext takes in HBase configurations and pushes them to the Spark 
executors.  This allows us to have an HBase Connection per Spark Executor in a 
static location.
+Just for reference Spark Executors can be on the same nodes as the Region 
Servers or on different nodes there is no dependence of co-location.  Think of 
every Spark Executor as a multi-threaded client application.
+Here is a simple example of how the HBaseContext can be used.  In this example 
we are doing a foreachPartition on a RDD in Scala.
+If Java is perferred instead of Scala it will look a little different but 
still vary possible as we can see with this example.
+All functionality between Spark and HBase will be supported both in Scala and 
in Java, with the exception of SparkSQL which will support any language that is 
supported by Spark.  For the remaining of this documentation we will focus on 
Scala examples for now.

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15300//console

This message is automatically generated.

> Add documentation for Initial Release for HBase-Spark Module integration 
> -------------------------------------------------------------------------
>
>                 Key: HBASE-14158
>                 URL: https://issues.apache.org/jira/browse/HBASE-14158
>             Project: HBase
>          Issue Type: Improvement
>          Components: documentation, spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14158.1.patch, HBASE-14158.2.patch
>
>
> Add documentation for Initial Release for HBase-Spark Module integration 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14158) Add documentation for Initial Release for HBase-Spark Module integration

Reply via email to