[jira] [Commented] (HBASE-5982) HBase Coprocessor Locate

Zhihong Yu (JIRA) Fri, 11 May 2012 09:51:11 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273413#comment-13273413
 ]


Zhihong Yu commented on HBASE-5982:
-----------------------------------

{code}
+ public <R, S> List<R> top(final byte[] tableName,
{code}
Please leave two leading spaces for method. See HBASE-5961 for an Eclipse 
formatter.
{code}
+     List<R> getMax() {
+       return top;
+     }
{code}
MaxCallBack originally returned one scalar. Please create a new CallBack, 
TopCallBack, for the 'top' operation.
{code}
+       //log.info("****the top"+topNum+ " value come from RegionServer="
{code}
Please remove all commented out log statements.
{code}
+ public <R, S> R max1(final byte[] tableName,
{code}
Please don't rename existing public method - this would cause regression.
                
> HBase Coprocessor Locate
> ------------------------
>
>                 Key: HBASE-5982
>                 URL: https://issues.apache.org/jira/browse/HBASE-5982
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors
>    Affects Versions: 0.92.1
>         Environment: cloudera-cdh3u3,hbase-0.92.1
>            Reporter: dengpeng
>              Labels: Coprocessor
>             Fix For: 0.92.1
>
>         Attachments: HBASE-5982.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> In our application, we need to handle the following SQL-like process on 
> hbase. There are very complex processes on each region, and the result of 
> 'top #' from each region will be sent back to the coprocessor client in the 
> current region-based endpoint framework. 
> Let's take the following SQL as an example. Suppose there are 100 regions in 
> each RS and there are 100 RSs in the cluster, the client will receive 
> 100*100*1M = 10G records from all the region, and then select top 1M records 
> from 10G records. The client need much RAM to handle these data and the 
> network of the cluster maybe the bottleneck.
> If we have the RS-based endpoint, each RS will handle parts of result from 
> its regions, the client will receive 100*1M = 0.1G records. The burden of the 
> client and the network will dramatically reduced. 
> example: 
> select top 1000000 count(1) as A , sum(intRxlevDL)/count(intRxlevDL) as B , 
> intBscPc as bscPc , intLac as LAC , intCI as CI from ftbMrMsg t1 where ( 
> t1.dtTime >= '2012-03-02 04:00:00.000' and t1.dtTime < '2012-03-02 
> 05:00:00.000' )group by bscPc , LAC , CI having B >= 0.2order by bscPc ASC , 
> LAC ASC , CI ASC
> So far, the network is a bottleneck in our application when using coprocess 
> to handle the above SQL. I think the RS-based Endpoint is worth doing, 
> especially for the 'top #' process. What's your opinion about this? I think 
> we can open a jira. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5982) HBase Coprocessor Locate

Reply via email to