[ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723800#comment-13723800
 ] 

Cheney Sun commented on HBASE-9086:
-----------------------------------

Hi Jean-Marc, thanks for pointing out the possible ways. But these look like 
not what I want. Both methods you mentioned above would retrieve back some K/V 
pairs to the client executing the command, right? In our case, this would harm 
the performance a lot. Let me briefly describe our case: the table schema is [ 
rowkey (20~30Bytes) | a:info (100~500+kB) | c:ref (empty in most rows) ]. If 
only specify the column a:info, it wouldn't help much since this column takes 
the most payload. If only specify c:ref, it wouldn't get the correct result, 
because most cells in this column are empty and will not be counted. 

Apparently, only specify the rowkey is the natural way to improve the count 
performance and also guarantee a correct result. Moreover, when using the count 
command, user really care about the row number, not the data. 

For now, I'm not sure if it's easy to implement such patch under the current 
HBase architecture.
                
> Add some options to improve count performance
> ---------------------------------------------
>
>                 Key: HBASE-9086
>                 URL: https://issues.apache.org/jira/browse/HBASE-9086
>             Project: HBase
>          Issue Type: Wish
>          Components: shell
>    Affects Versions: 0.94.2
>            Reporter: Cheney Sun
>
> The current count command in HBase shell is quite slow if the row size is 
> very big (100+kB each). It would be helpful to provide some option to specify 
> the column to count, which could give user a chance to reduce the data volume 
> to scan. 
> IMHO, only count the row key would be the ideal solution. Not sure how 
> difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to