[ 
https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617215#comment-14617215
 ] 

Hadoop QA commented on HBASE-13867:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743985/HBASE-13867.1.patch
  against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6.
  ATTACHMENT ID: 12743985

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
    +HBase Coprocessors are modeled after the Coprocessors which are part of 
Google's BigTable 
(http://static.googleusercontent.com/media/research.google.com/en//people/jeff/SOCC2010-keynote-slides.pdf,
 pages 41-42.). + 
+Coprocessor is a framework that provides an easy way to run your custom code 
directly on Region Server.
+. Mingjie Lai's blog post  
link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor 
Introduction].
+. Gaurav Bhardwaj's blog post 
link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of 
HBase Coprocessors].
+When working with any data store (like RDBMS or HBase) you fetch the data (in 
case of RDBMS you might use SQL query and in case of HBase you use either Get 
or Scan). To fetch only relevant data you filter it (for RDBMS you put 
conditions in 'WHERE' clause and in HBase you use Filters). After fetching the 
desired data, you perform your business computation on the data. +
+This scenario is close to ideal for "small data", where few thousand rows and 
a bunch of columns are returned from the data store. Now imagine a scenario 
where there are billions of rows and millions of columns and you want to 
perform some computation which requires all the data, like calculating average 
or sum. Even if you are interested in just few columns, you still have to fetch 
all the rows. There are a few drawbacks in this approach as described below:
+. In this approach the data transfer (from data store to client side) will 
become the bottleneck, and the time required to complete the operation is 
limited by the rate at which data transfer is taking place.
+. Bandwidth is one of the most precious resources in any data center. 
Operations like this will severely impact the performance of your cluster.
+. Your client code is becoming thick as you are maintaining the code for 
calculating average or summation on client side. Not a major drawback when 
talking of severe issues like performance/bandwidth but still worth giving 
consideration.
+In a scenario like this it's better to move the computation (i.e. user's 
custom code) to the data itself (Region Server). Coprocessor helps you achieve 
this but you can do more than that. There is another advantage that your code 
runs in parallel (i.e. on all Regions). To give an idea of Coprocessor's 
capabilities, different people give different analogies. The three most famous 
analogies for Coprocessor present in the industry are:

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):       
at org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137)
        at 
org.apache.oozie.test.XTestCase$MiniClusterShutdownMonitor.run(XTestCase.java:1071)
        at org.apache.oozie.test.XTestCase.waitFor(XTestCase.java:692)
        at 
org.apache.oozie.action.hadoop.TestMapReduceActionExecutor.testSetExecutionStats_when_user_has_specified_stats_write_TRUE(TestMapReduceActionExecutor.java:976)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//console

This message is automatically generated.

> Add endpoint coprocessor guide to HBase book
> --------------------------------------------
>
>                 Key: HBASE-13867
>                 URL: https://issues.apache.org/jira/browse/HBASE-13867
>             Project: HBase
>          Issue Type: Task
>          Components: Coprocessors, documentation
>            Reporter: Vladimir Rodionov
>            Assignee: Gaurav Bhardwaj
>         Attachments: HBASE-13867.1.patch
>
>
> Endpoint coprocessors are very poorly documented.
> Coprocessor section of HBase book must be updated either with its own 
> endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some 
> other guides. There is good description here:
> http://www.3pillarglobal.com/insights/hbase-coprocessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to