[
https://issues.apache.org/jira/browse/HBASE-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandre Normand updated HBASE-9243:
-------------------------------------
Attachment: HBASE-9243-2.patch
I used the conservative yammer metrics style but, in this case, the count
should be accurate so I'm changing this to report {{count = <value>}}.
As for your question regarding the number of different value lengths vs number
of values, technically speaking, it's the count of samples of value length. In
practice, that should be the same as saying the number of values in that file.
Here's the updated output from that last version:
{code}
Stats: Key length:
min = 29.00
max = 29.00
mean = 29.00
stddev = 0.00
median = 29.00
75% <= 29.00
95% <= 29.00
98% <= 29.00
99% <= 29.00
99.9% <= 29.00
count = 15
Row size (bytes):
min = 6921.00
max = 455286.00
mean = 320933.67
stddev = 160821.04
median = 346302.00
75% <= 455190.00
95% <= 455286.00
98% <= 455286.00
99% <= 455286.00
99.9% <= 455286.00
count = 15
Row size (columns):
min = 110.00
max = 7595.00
mean = 5352.73
stddev = 2685.21
median = 5775.00
75% <= 7594.00
95% <= 7595.00
98% <= 7595.00
99% <= 7595.00
99.9% <= 7595.00
count = 15
Val length:
min = 19.00
max = 55.00
mean = 22.96
stddev = 1.48
median = 23.00
75% <= 23.00
95% <= 23.00
98% <= 23.00
99% <= 23.00
99.9% <= 54.00
count = 80291
{code}
> Add more useful statistics in the HFile tool
> --------------------------------------------
>
> Key: HBASE-9243
> URL: https://issues.apache.org/jira/browse/HBASE-9243
> Project: HBase
> Issue Type: Improvement
> Components: HFile
> Affects Versions: 0.96.0
> Reporter: Alexandre Normand
> Priority: Minor
> Labels: newbie
> Attachments: HBASE-9243-1.patch, HBASE-9243-2.patch, HBASE-9243.patch
>
>
> The [HFile tool|http://hbase.apache.org/book/regions.arch.html#hfile_tool]
> has been very useful to us recently to get a better idea of the size of our
> rows. However, it happened frequently that we wished for more statistics to
> have a more complete picture of the distribution of the row sizes.
> [~skuehn] requested that feature often enough in private that I decided to
> give it a go.
> Here's the patch that adds more nice little stats via yammer's histograms. It
> was easy enough since {{com.yammer.metrics}} is already in hbase's
> dependencies.
> Example of the new output from {{org.apache.hadoop.hbase.io.hfile.HFile -s -f
> ...}}:
> {code}
> Stats:
> Key length:
> min = 24.00
> max = 24.00
> mean = 24.00
> stddev = 0.00
> median = 24.00
> 75% <= 24.00
> 95% <= 24.00
> 98% <= 24.00
> 99% <= 24.00
> 99.9% <= 24.00
> Row size (bytes):
> min = 33.00
> max = 33.00
> mean = 33.00
> stddev = 0.00
> median = 33.00
> 75% <= 33.00
> 95% <= 33.00
> 98% <= 33.00
> 99% <= 33.00
> 99.9% <= 33.00
> Row size (columns):
> min = 1.00
> max = 1.00
> mean = 1.00
> stddev = 0.00
> median = 1.00
> 75% <= 1.00
> 95% <= 1.00
> 98% <= 1.00
> 99% <= 1.00
> 99.9% <= 1.00
> Val length:
> min = 1.00
> max = 1.00
> mean = 1.00
> stddev = 0.00
> median = 1.00
> 75% <= 1.00
> 95% <= 1.00
> 98% <= 1.00
> 99% <= 1.00
> 99.9% <= 1.00
> Key of biggest row: \x00
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira