[
https://issues.apache.org/jira/browse/HBASE-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110527#comment-17110527
]
Michael Stack commented on HBASE-17756:
---------------------------------------
[~shahrs87] added sub-issue w/ patch for hbase-operator-tools adding a 'table
reporter' tool that is basic; just reads the table and then generates
histograms on size and column count. Can be expanded upon. Ideas on what else
to 'sketch' given it is already reading all of the data appreciated (This can
be basis for the Region/Table PrettyPrinter we talk of above).
Row-view is hard to integrate into running hbase because, as we say above, we
do cells... w/ row-notion a read-time construct; even at compaction time, its
all Cells and a subset usually unless major compact.. so seems like hfile is
focus server-side?
> We should have better introspection of HFiles
> ---------------------------------------------
>
> Key: HBASE-17756
> URL: https://issues.apache.org/jira/browse/HBASE-17756
> Project: HBase
> Issue Type: Brainstorming
> Components: HFile
> Reporter: Esteban Gutierrez
> Assignee: Rushabh Shah
> Priority: Major
>
> [[email protected]] was suggesting to use DataSketches
> (https://datasketches.github.io) in order to write additional statistics to
> the HFiles. This could be used to improve our split decisions,
> troubleshooting or potentially do other interesting analysis without having
> to perform full table scans. The statistics could be stored as part of the
> HFile but we could initially improve the visibility of the data by adding
> some statistics to HFilePrettyPrinter.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)