[
https://issues.apache.org/jira/browse/HBASE-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894387#action_12894387
]
Jonathan Gray commented on HBASE-2893:
--------------------------------------
This sounds really interesting andy. I'm a little concerned that this would be
rather disruptive to the code but used by a very small portion of users.
So the default behavior would be to always create the metacolumn family and the
read path will always have these checks in it? Maybe this feature itself
should be a table-level setting and should try to get all the logic related to
this into new classes with just a hook or two into the existing read-time
checks.
The current QueryMatcher/Tracker code paths are starting to get a little messy
and I'm a little worried about adding a bunch of new checks to every KV for
this or any other feature (there's some work going into some of the seek/reseek
optimizations and it's hard to move it forward because adding another couple
row checks can be significant if done on every kv).
In addition, this would break with the pattern of each family able to be
processed in isolation. Now, reading of each family will require an additional
scanner against the metacolumn family. So, if reading from a 5 family table
(+1 for meta), you'd end up reading the metacolumn 5 times, once for each user
family? Things like the bloom filter check would have to happen during the
read, so at a different level than it's currently done.
Would this check be first, last, or scattered throughout the read checks? I
would guess first but not sure if there are other things desired besides TTL
and ACLs that might require some of the existing checks first. I'm not quite
sure I understand the TTL use case, this seems like an extremely rare use case
where you'd have TTLs applied at row granularity? I suppose this kind of
fine-grained policy setting is desirable but I guess it's less clear why you
couldn't break stuff up into separate tables for varied TTLs or multi-tenancy.
Or if you have these very specific and fine-grained settings like variable TTL
you would implement them in your application.
When do you set this stuff? Would inserts be augmented? Would there be
special types of KVs that you could write at the same time you insert the
actual data? Above description addresses where it is stored and when it is
looked up, but not how it is set. Would Put be extended with per-row setTTL,
setACL methods now?
Out of curiosity, which BT-like systems support per-value ACLs? I don't think
I've seen this in any DBs I've worked with.
> Table metacolumns
> -----------------
>
> Key: HBASE-2893
> URL: https://issues.apache.org/jira/browse/HBASE-2893
> Project: HBase
> Issue Type: New Feature
> Reporter: Andrew Purtell
>
> Some features like TTLs or access control lists have use cases that call for
> per-value configurability.
> Currently in HBase TTLs are set per column family. This leads to potentially
> awkward "bucketing" of values into column families set up to accommodate the
> common desired TTLs for all values within -- an unnecessarily wide schema,
> with resulting unnecessary reduction in I/O locality in access patterns, more
> store files than otherwise, and so on.
> Over in HBASE-1697 we're considering setting ACLs on column families.
> However, we are aware of other BT-like systems which support per-value ACLs.
> This allows for multitenancy in a single table as opposed to really requiring
> tables for each customer (or, at least column families). The scale out
> properties for a single table are better than alternatives. I think
> supporting per-row ACLs would be generally sufficient: customer ID could be
> part of the row key. We can still plan to maintain column-family level ACLs.
> We would therefore not have to bloat the store with per-row ACLs for the
> normal case -- but it would be highly useful to support overrides for
> particular rows. So how to do that?
> I propose to introduce _metacolumns_.
> A _metacolumn_ would be a column family intrinsic to every table, created by
> the system at table create time. It would be accessible like any other
> column family, but we expect a default ACL that only allows access by the
> system and operator principals, and would function like any other, except
> administrative actions such as renaming or deletion would not be allowed.
> Into the metacolumn would be stored per-row overrides for such things as ACLs
> and TTLs. The metacolumn therefore would be as sparse as possible; no storage
> would required for any overrides if a value is committed with defaults. A
> reasonably sparse metacolumn for a region may fit entirely within blockcache.
> It may be possible for all metacolumns on a RS to fit within blockcache
> without undue pressure on other users. We can aim design effort at this
> target.
> The scope of changes required to support this is:
> - Introduce metacolumn concept in the code and into the security model
> (default ACL): A flag in HCD, a default ACL, and a few additional checks for
> rejecting disallowed administrative actions.
> - Automatically create metacolumns at table create time.
> - Consult metatable as part of processing reads or mutations, perhaps using a
> bloom filter to shortcut lookups for rows with no metaentries, and apply
> configuration or security policy overrides if found.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.