[
https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617215#comment-14617215
]
Hadoop QA commented on HBASE-13867:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12743985/HBASE-13867.1.patch
against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6.
ATTACHMENT ID: 12743985
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 protoc{color}. The applied patch does not increase the
total number of protoc compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 lineLengths{color}. The patch introduces the following lines
longer than 100:
+HBase Coprocessors are modeled after the Coprocessors which are part of
Google's BigTable
(http://static.googleusercontent.com/media/research.google.com/en//people/jeff/SOCC2010-keynote-slides.pdf,
pages 41-42.). +
+Coprocessor is a framework that provides an easy way to run your custom code
directly on Region Server.
+. Mingjie Lai's blog post
link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor
Introduction].
+. Gaurav Bhardwaj's blog post
link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of
HBase Coprocessors].
+When working with any data store (like RDBMS or HBase) you fetch the data (in
case of RDBMS you might use SQL query and in case of HBase you use either Get
or Scan). To fetch only relevant data you filter it (for RDBMS you put
conditions in 'WHERE' clause and in HBase you use Filters). After fetching the
desired data, you perform your business computation on the data. +
+This scenario is close to ideal for "small data", where few thousand rows and
a bunch of columns are returned from the data store. Now imagine a scenario
where there are billions of rows and millions of columns and you want to
perform some computation which requires all the data, like calculating average
or sum. Even if you are interested in just few columns, you still have to fetch
all the rows. There are a few drawbacks in this approach as described below:
+. In this approach the data transfer (from data store to client side) will
become the bottleneck, and the time required to complete the operation is
limited by the rate at which data transfer is taking place.
+. Bandwidth is one of the most precious resources in any data center.
Operations like this will severely impact the performance of your cluster.
+. Your client code is becoming thick as you are maintaining the code for
calculating average or summation on client side. Not a major drawback when
talking of severe issues like performance/bandwidth but still worth giving
consideration.
+In a scenario like this it's better to move the computation (i.e. user's
custom code) to the data itself (Region Server). Coprocessor helps you achieve
this but you can do more than that. There is another advantage that your code
runs in parallel (i.e. on all Regions). To give an idea of Coprocessor's
capabilities, different people give different analogies. The three most famous
analogies for Coprocessor present in the industry are:
{color:green}+1 site{color}. The mvn post-site goal succeeds with this patch.
{color:green}+1 core tests{color}. The patch passed unit tests in .
{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137)
at
org.apache.oozie.test.XTestCase$MiniClusterShutdownMonitor.run(XTestCase.java:1071)
at org.apache.oozie.test.XTestCase.waitFor(XTestCase.java:692)
at
org.apache.oozie.action.hadoop.TestMapReduceActionExecutor.testSetExecutionStats_when_user_has_specified_stats_write_TRUE(TestMapReduceActionExecutor.java:976)
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//testReport/
Release Findbugs (version 2.0.3) warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//artifact/patchprocess/checkstyle-aggregate.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/14695//console
This message is automatically generated.
> Add endpoint coprocessor guide to HBase book
> --------------------------------------------
>
> Key: HBASE-13867
> URL: https://issues.apache.org/jira/browse/HBASE-13867
> Project: HBase
> Issue Type: Task
> Components: Coprocessors, documentation
> Reporter: Vladimir Rodionov
> Assignee: Gaurav Bhardwaj
> Attachments: HBASE-13867.1.patch
>
>
> Endpoint coprocessors are very poorly documented.
> Coprocessor section of HBase book must be updated either with its own
> endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some
> other guides. There is good description here:
> http://www.3pillarglobal.com/insights/hbase-coprocessors
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)