[ 
https://issues.apache.org/jira/browse/HBASE-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110527#comment-17110527
 ] 

Michael Stack commented on HBASE-17756:
---------------------------------------

[~shahrs87] added sub-issue w/ patch for hbase-operator-tools adding a 'table 
reporter' tool that is basic; just reads the table and then generates 
histograms on size and column count. Can be expanded upon. Ideas on what else 
to 'sketch' given it is already reading all of the data appreciated (This can 
be basis for the Region/Table PrettyPrinter we talk of above).

Row-view is hard to integrate into running hbase because, as we say above, we 
do cells... w/ row-notion a read-time construct; even at compaction time, its 
all Cells and a subset usually unless major compact.. so seems like hfile is 
focus server-side?



> We should have better introspection of HFiles
> ---------------------------------------------
>
>                 Key: HBASE-17756
>                 URL: https://issues.apache.org/jira/browse/HBASE-17756
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: HFile
>            Reporter: Esteban Gutierrez
>            Assignee: Rushabh Shah
>            Priority: Major
>
> [[email protected]] was suggesting to use DataSketches 
> (https://datasketches.github.io) in order to write additional statistics to 
> the HFiles. This could be used to improve our split decisions, 
> troubleshooting or potentially do other interesting analysis without having 
> to perform full table scans. The statistics could be stored as part of the 
> HFile but we could initially improve the visibility of the data by adding 
> some statistics to HFilePrettyPrinter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to