Hi, I put up a PR to add a new coprocessor to the hbase-examples module. Would love to get the community’s thoughts on it!
*Quick Overview:* HBase provides many configuration options to help administrators tune their tables and clusters for desired performance and reliability. However, it can be difficult for administrators to leverage these configuration options because they do not have a nuanced understanding of the shape of their data in HBase. The row statistics coprocessor allows administrators to collect statistics on the rows in their HBase tables as these rows compact. With more information about the shape of their data in HBase, administrators can leverage the available configuration options in HBase to unlock performance and reliability gains for their tables/clusters. Mmore in the PR: https://github.com/apache/hbase/pull/6327 *Row Statistics In Action:* At my day job, we’ve had this coprocessor running in all of our QA and Prod RegionServers for over a year. It has allowed us to - Tune the block sizes of our tables that serve predominantly random read heavy traffic automatically. This reduced the operational burden associated with the assessment and rollout of block size changes to tables on administrators. - Power an internal campaign to alert other teams at our company about huge cells that exist in the HBase tables associated with their part of the product. Huge cells are ticking time bombs in HBase. While a block is the smallest unit of data that HBase can read from disk, if a block is too large, HBase cannot cache the block in memory. This improved the reliability of our product, delivering a win for customers. - Implement smarter major compactions schedules based the workload patterns hitting a region, especially writes to that region and tombstone counts. This reduced our daily network data transfer cost by nearly $2,000/day. Thank you for your consideration! Best, Evelyn Boland