Hi Nick,

Thank you very much for showing interest in this feature, appreciate
your thoughts! Also, please accept my apologies for the long delay in
responding.

> Outside of your specific use-case, can you propose other reasons why HBase
> should maintain such data partitioning? Are there other features that become
> possible with this enabled? I'd like to work out if this should be a default
> implementation of our storage system, or if this is going to forever sit
> behind a disabled feature flag.

Our plan is to refactor the regular writer/reader code path such that
the new section encoding is handled automatically, and we are doing
some basic spikes to confirm the feasibility.

That said, it would be definitely nice to make this the standard
format, but we need to consider the fact that it will introduce one
more indirection while navigating to the data block because of the
need to add a new top level section index. For use cases that need
PBE, this is justifiable, but for the rest it may seem unnecessary. We
can still make a call after assessing the perf impact of this change.

As for generalizing this virtual HFile as a data partitioning, one
possibility is to extend it for things like configs and quotas, but I
am not sure about the value add here.

> This does sound like a monstrous changeset. Do you have a strategy for
> delivering this in reviewable pieces?

We don't actually expect this change to be that huge because it will
make use of existing encryption code and the new code is mostly around
key management and caching. I have done a spike that can give an idea
of the type of changes this will bring in and the changes are
available here: https://github.com/haridsv/hbase/tree/key-meta-poc

We can definitely raise the PRs incrementally. E.g., at a gross level,
we can raise a PR for key management first and then follow up with the
HFile changes. But within these, we can also make them incremental PRs
by vertically slicing the features (similar to how the spike changes
are structured) following the existing development patterns in HBase.

> Are there meaningful refactorings that should be done to the HFile classes
> along the way?

I think this is too early to answer. We are setting reuse of the
existing code as much as possible for the purpose of the new format
and that might require some refactoring, but we can definitely take up
any obvious improvements if we come across along the way.

> Do you intend to backport to branch-2, or will you be content with getting it
> onto branch-3?

Yes, we would need this change for branch-2 as well.

> ---------- Forwarded message ----------
> From: Nick Dimiduk <ndimi...@apache.org>
> To: dev@hbase.apache.org
> Cc:
> Bcc:
> Date: Thu, 13 Feb 2025 11:18:25 +0100
> Subject: Re: [DISCUSS] Row Prefix Based Encryption (PBE) to encrypt different 
> sections of HFile with different keys
> Hi Hari,
>
> I find it interesting, the different creative ways that people imagine
> use for HBase's ordered bytes physical characteristic. Usually those
> ideas are around region placement -- this is pretty clever! Outside of
> your specific use-case, can you propose other reasons why HBase should
> maintain such data partitioning? Are there other features that become
> possible with this enabled? I'd like to work out if this should be a
> default implementation of our storage system, or if this is going to
> forever sit behind a disabled feature flag.
>
> This does sound like a monstrous changeset. Do you have a strategy for
> delivering this in reviewable pieces? Are there meaningful
> refactorings that should be done to the HFile classes along the way?
> Do you intend to backport to branch-2, or will you be content with
> getting it onto branch-3?
>
> Thanks,
> Nick
>

Reply via email to