[
https://issues.apache.org/jira/browse/HBASE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393697#comment-13393697
]
Lars George commented on HBASE-6222:
------------------------------------
bq. @Andy That's not essential to storing labels in a metacolumn, though it may
be advisable for performance reasons.
Understood. I am not saying this is needed or that metacolumns do not work
without. In fact, I think that they are very useful in the context you
discussed with Matt, i.e. for example TTLs. I personally think there is a need
for two optional features: a) metacolumns - which cover broader rules for a
many columns or rows, and b) KV tags - which are carried as low as they can get
to retain per cell information.
So for TTL I would think that the tags are too low, yet for security I do think
that metacolumns are too "weak" of a guarantee.
bq. @Andy and @Matt: So we may have this as a way to store tags inline with
data, with dedup/optimize away if not needed; and we may have Lars' somehow tag
structure addition to KV (Lars: what would that look like?). Worth doing a
bake-off?
I think this is not either or, but - and Matt please correct me if mistaken -
if we add Trie compression then we can leverage the implementation to handle
it. If we decide not to merge the two, then we can use my suggestion of adding
them to the KV optionally and we can handle the compression implications later.
bq. @Andy: We could agree on criteria such as: Tag storage optimized out if no
tags present
Indeed, since we use a new type, no extra storage is needed if no tag is
attached.
bq. @Andy: Compartmentalized changes
Agreed, we add a new type and handle that case separately. Though the majority
of the code is shared, the new type would trigger the extraction of the tags if
called for (which I assume would be done lazily).
bq. @Andy: Generic mechanism for adding, reading, removing, and modifying tags,
usable by coprocessors.
These are the KeyValue.addTag(byte[] name, byte[] value) and
KeyValue.getTag(byte[] name) helpers I was referring to. The coprocessors has
full access that way, since the tags are carried for each KV.
bq. @Andy: No we don't have to mimic the Accumulo API though if the goal here
is to be an alternative, it must be possible to build a direct API translation
shim that provides the same labelling and visibility semantisc.
Indeed. One of the arguments I hear comparing HBase and Accumulo is the fact
that we have no cell level security tagging. That is what this is all about. My
proposal is - as much as I can tell - lean (as it uses no extra storage if not
used), can be combined with the non-cell level security (you might not want
this level of security to avoid extra baggage), does not change the
comparators, and overall is quite non-intrusive in existing code. On the other
hand it seems useful for other cell level features in the future.
As Jon says, Accumulo uses these tags and the always-on filter to achieve
security (on a very high level view), and so can we then. For me this is
comparable then. We do not need to comply to the entire API, but feature set
level only.
bq. @stack: A core of required's with optional tags that don't cost unless you
use them would be grand.
That is exactly my point. As for "KV in KV", I do not see how this is "odd" as
our KeyValue for starters is the odd one given what most people understand of
what a KV is. Coming to terms with our complex key and various sorting rules is
not trivial.
bq. @stack: Good point. Maybe not even lost, mayhaps a bug would cause us skip
the metacolumn?
Spot on!
bq. @Matt: I guess I'm saying it's maybe ok to muck up the current KV even more
given that data block encoding should be able to clean up the mess down the
road. That being said, I don't personally need this feature so I hate to
suggest mucking up anything!
Agreed, this is about timing as well. Your patch is highly intrusive - but for
good reasons. So I would love to discuss this current issue with your changes
already applied. But on the other hand we have to make a call for what we want
and when?
@Laxman: The basic premise here is to be on-par security wise with Accumulo.
That is the use-case. As for scalability, I do not see why a few extra bytes
and a coprocessor that checks them is disastrous. Sure, this needs evaluation,
but we know that other systems - like Accumulo - does it, so if someone wants
to enable it, they should see the same impact. Small or big. Or asking the
other way around, where do you see this could affect the performance?
bq. How about other approach of supporting access control through HBase views?
The issue is that these are typically only on the row level. With the cell
level you can filter as fine grained as possible. Views - and please object if
I am wrong - are more coarse grained. Think of blocking access to some columns
differently across many rows. Not just all CF/CQs allowed for all rows.
That latter is the crucial difference of what is needed to be on-par.
> Add per-KeyValue Security
> -------------------------
>
> Key: HBASE-6222
> URL: https://issues.apache.org/jira/browse/HBASE-6222
> Project: HBase
> Issue Type: New Feature
> Components: security
> Reporter: stack
>
> Saw an interesting article:
> http://www.fiercegovernmentit.com/story/sasc-accumulo-language-pro-open-source-say-proponents/2012-06-14
> "The Senate Armed Services Committee version of the fiscal 2013 national
> defense authorization act (S. 3254) would require DoD agencies to foreswear
> the Accumulo NoSQL database after Sept. 30, 2013, unless the DoD CIO
> certifies that there exists either no viable commercial open source database
> with security features comparable to [Accumulo] (such as the HBase or
> Cassandra databases)..."
> Not sure what a 'commercial open source database' is, and I'm not sure whats
> going on in the article, but tra-la-la'ing, if we had per-KeyValue 'security'
> like Accumulo's, we might put ourselves in the running for federal
> contributions?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira