Hi Nick, Thank you very much for showing interest in this feature, appreciate your thoughts! Also, please accept my apologies for the long delay in responding.
> Outside of your specific use-case, can you propose other reasons why HBase > should maintain such data partitioning? Are there other features that become > possible with this enabled? I'd like to work out if this should be a default > implementation of our storage system, or if this is going to forever sit > behind a disabled feature flag. Our plan is to refactor the regular writer/reader code path such that the new section encoding is handled automatically, and we are doing some basic spikes to confirm the feasibility. That said, it would be definitely nice to make this the standard format, but we need to consider the fact that it will introduce one more indirection while navigating to the data block because of the need to add a new top level section index. For use cases that need PBE, this is justifiable, but for the rest it may seem unnecessary. We can still make a call after assessing the perf impact of this change. As for generalizing this virtual HFile as a data partitioning, one possibility is to extend it for things like configs and quotas, but I am not sure about the value add here. > This does sound like a monstrous changeset. Do you have a strategy for > delivering this in reviewable pieces? We don't actually expect this change to be that huge because it will make use of existing encryption code and the new code is mostly around key management and caching. I have done a spike that can give an idea of the type of changes this will bring in and the changes are available here: https://github.com/haridsv/hbase/tree/key-meta-poc We can definitely raise the PRs incrementally. E.g., at a gross level, we can raise a PR for key management first and then follow up with the HFile changes. But within these, we can also make them incremental PRs by vertically slicing the features (similar to how the spike changes are structured) following the existing development patterns in HBase. > Are there meaningful refactorings that should be done to the HFile classes > along the way? I think this is too early to answer. We are setting reuse of the existing code as much as possible for the purpose of the new format and that might require some refactoring, but we can definitely take up any obvious improvements if we come across along the way. > Do you intend to backport to branch-2, or will you be content with getting it > onto branch-3? Yes, we would need this change for branch-2 as well. > ---------- Forwarded message ---------- > From: Nick Dimiduk <ndimi...@apache.org> > To: dev@hbase.apache.org > Cc: > Bcc: > Date: Thu, 13 Feb 2025 11:18:25 +0100 > Subject: Re: [DISCUSS] Row Prefix Based Encryption (PBE) to encrypt different > sections of HFile with different keys > Hi Hari, > > I find it interesting, the different creative ways that people imagine > use for HBase's ordered bytes physical characteristic. Usually those > ideas are around region placement -- this is pretty clever! Outside of > your specific use-case, can you propose other reasons why HBase should > maintain such data partitioning? Are there other features that become > possible with this enabled? I'd like to work out if this should be a > default implementation of our storage system, or if this is going to > forever sit behind a disabled feature flag. > > This does sound like a monstrous changeset. Do you have a strategy for > delivering this in reviewable pieces? Are there meaningful > refactorings that should be done to the HFile classes along the way? > Do you intend to backport to branch-2, or will you be content with > getting it onto branch-3? > > Thanks, > Nick >