[
https://issues.apache.org/jira/browse/HBASE-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109723#comment-17109723
]
Michael Stack commented on HBASE-17756:
---------------------------------------
If you want to take it up from here [~shahrs87], I'd suggest:
* run the linked Performance Evaluation Test Tool for HFiles to see if the
sketches slow writing; if it doesn't slow the writing, then lets make a subtask
to commit recording of sketches at write time (any other things we should
sketch?).
* Then lets commit this to pretty printer. If you have ideas for how to make
it prettier, just say
* Then maybe make a new issue for the Region and Table Pretty Printers.....
> We should have better introspection of HFiles
> ---------------------------------------------
>
> Key: HBASE-17756
> URL: https://issues.apache.org/jira/browse/HBASE-17756
> Project: HBase
> Issue Type: Brainstorming
> Components: HFile
> Reporter: Esteban Gutierrez
> Assignee: Rushabh Shah
> Priority: Major
>
> [[email protected]] was suggesting to use DataSketches
> (https://datasketches.github.io) in order to write additional statistics to
> the HFiles. This could be used to improve our split decisions,
> troubleshooting or potentially do other interesting analysis without having
> to perform full table scans. The statistics could be stored as part of the
> HFile but we could initially improve the visibility of the data by adding
> some statistics to HFilePrettyPrinter.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)