[ 
https://issues.apache.org/jira/browse/HBASE-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705656#action_12705656
 ] 

stack commented on HBASE-32:
----------------------------

HFile does some of this already:

+ average key length
+ average value length
+ key count
+ entries in block and meta index
+ last key in file

These are easy to add if we need more.

Yeah, we should leverage it if only to read all meta in one region and then 
extrapolate (as you suggest).

> [hbase] Add row count estimator
> -------------------------------
>
>                 Key: HBASE-32
>                 URL: https://issues.apache.org/jira/browse/HBASE-32
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>            Reporter: stack
>            Priority: Minor
>         Attachments: 2291_v01.patch, Keying.java
>
>
> Internally we have a little tool that will do a rough estimate of how many 
> rows there are in a dataHbase.  It keeps getting larger and larger partitions 
> running scanners until it turns up > N occupied rows.  Once it has a number > 
> N, it multiples by the partition size to get an approximate row count.  
> This issue is about generalizing this feature so it could sit in the general 
> hbase install.  It would look something like:
> {code}
> long getApproximateRowCount(final Text startRow, final Text endRow, final 
> long minimumCountPerPartition, final long maximumPartitionSize)
> {code}
> Larger minimumCountPerPartition and maximumPartitionSize values would make 
> the count more accurate but would mean the method ran longer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to