[
https://issues.apache.org/jira/browse/HBASE-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902528#comment-15902528
]
stack commented on HBASE-17756:
-------------------------------
What stats we want on an hfile?
+ Rough count on each key instance?
+ Similar for key/value sizes?
+ Versions of Cells in an hfile (HBASE-12311 Version stats in HFiles?)
+ HBASE-7958 talked of row key distribution, cardinality as well as column
family/column qualifier cardinality as well as a bunch of other possibles.
Later we could merge up hfile content to make a region stat... (
> We should have better introspection of HFiles
> ---------------------------------------------
>
> Key: HBASE-17756
> URL: https://issues.apache.org/jira/browse/HBASE-17756
> Project: HBase
> Issue Type: Brainstorming
> Components: HFile
> Reporter: Esteban Gutierrez
>
> [[email protected]] was suggesting to use DataSketches
> (https://datasketches.github.io) in order to write additional statistics to
> the HFiles. This could be used to improve our split decisions,
> troubleshooting or potentially do other interesting analysis without having
> to perform full table scans. The statistics could be stored as part of the
> HFile but we could initially improve the visibility of the data by adding
> some statistics to HFilePrettyPrinter.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)