Hi,

I put up a PR to add a new coprocessor to the hbase-examples module. Would
love to get the community’s thoughts on it!

*Quick Overview:*

HBase provides many configuration options to help administrators tune their
tables and clusters for desired performance and reliability. However, it
can be difficult for administrators to leverage these configuration options
because they do not have a nuanced understanding of the shape of their data
in HBase.

The row statistics coprocessor allows administrators to collect statistics
on the rows in their HBase tables as these rows compact. With more
information about the shape of their data in HBase, administrators can
leverage the available configuration options in HBase to unlock performance
and reliability gains for their tables/clusters.

Mmore in the PR: https://github.com/apache/hbase/pull/6327

*Row Statistics In Action:*

At my day job, we’ve had this coprocessor running in all of our QA and Prod
RegionServers for over a year. It has allowed us to

   - Tune the block sizes of our tables that serve predominantly random
   read heavy traffic automatically. This reduced the operational burden
   associated with the assessment and rollout of block size changes to tables
   on administrators.


   - Power an internal campaign to alert other teams at our company about
   huge cells that exist in the HBase tables associated with their part of the
   product. Huge cells are ticking time bombs in HBase. While a block is the
   smallest unit of data that HBase can read from disk, if a block is too
   large, HBase cannot cache the block in memory. This improved the
   reliability of our product, delivering a win for customers.


   - Implement smarter major compactions schedules based the workload
   patterns hitting a region, especially writes to that region and tombstone
   counts. This reduced our daily network data transfer cost by nearly
   $2,000/day.


Thank you for your consideration!

Best,
Evelyn Boland

Reply via email to