Hi Don (apologies for the earlier misspelling), I was a bit off in my analysis, I mixed client and plugin. getTags can definitely work, but I don’t think we should move everything to the “advanced” side of the plugin. In other words I think the TagEnricher should still do its thing.
I’m going to be AFK for awhile, but I’ll pick this after that. B. > On 7 Dec 2018, at 20:12, Bolke de Bruin <bdbr...@gmail.com> wrote: > > Hi Dan, > > Thanks for think along. Answers inline again. > > B. > > Verstuurd vanaf mijn iPad > >> Op 7 dec. 2018 om 19:42 heeft Don Bosco Durai <bo...@apache.org> het >> volgende geschreven: >> >> Hi Bolke >> >> Thanks for the suggestion and contribution. >> >> I am trying to understand your approach. Can you add your patch in Review >> Board. It will be easy to see the changes you have done visually. > > Will do after the fixing the selection of which tag version takes precedence > (see below). > >> >> I have a few questions and design suggestions. >> 1. When you say "Client", are you referring to "Plugin Implementation Code" >> or "End User" (similar to Accumulo model) > > Plugin. I don’t trust the user :-). If at least by plugin you mean agent > inside for example Hive. > >> 2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good >> suggestion and we should support it. It is end-user, then it is a longer >> discussion > > No long discussion then :-) > >> 3. Based on the discussion so far and reviewing the code change at high >> level, it seems you are extending at the Tag Enricher level. Alternatively, >> would it me more design friendly to provide a method/API in the plugin >> interface to return or override the tags. E.g. getTags( request ). Custom >> plugins can override this method and alter the Tags to be returned. This >> might be more isolated and cleaner implementation, so the Plugin writers can >> only focus on their Plugin implementation. > > > That could work. Taking that a bit further I was always thinking that > ‘context’ was something to be supplied by the client. It caught me by > surprise that a client (i.e. plugin) can actually set no additional context > at all. So instead of a getTags I suggest a “updateContext” with name spaced > keys. I think that is more future proof. However it won’t be backwards > compatible (this is why I choose to implement it as it is now). So when you > upgrade Ranger you would need to update all your clients at once. Adding > getTags would do the same. What do you think? > > I’m now working out which tag takes precedence, e.g. a system supplied one or > a client supplied one. I will use the updatedTime field for this. This would > need somewhat more complexity than I would expect a custom plugin to handle. > > >> 4. If needed, for advanced users, we can provide an interface or API to >> implement their own Tag Sync. Which could be in addition to Atlas/Kafka or >> exclusive to their environment or Meta Store. > > I suggest making it possible to have multiple syncs and be able to set the > order in which they should be evaluated and a hierarchy which one can > overwrite the other. But this is for later. > >> >> Thanks >> >> Bosco >> >> >> >> On 12/5/18, 11:59 AM, "Bolke de Bruin" <bdbr...@gmail.com> wrote: >> >> Hi Abhay, >> >> Also answers inline. >> >> B. >> >> Verstuurd vanaf mijn iPad >> >>> Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <akulka...@hortonworks.com> >>> het volgende geschreven: >>> >>> Hi Bolke, >>> >>> My comments inline. >>> >>> Thanks, >>> -Abhay >>> >>>> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bdbr...@gmail.com> wrote: >>>> >>>> Hi Abhay, >>>> >>>> Good point on #1 will take that into account if possible (can a enricher >>>> call audit events?). >>>> >>>> On #2 yes, otherwise the resource matcher will stop working. Maybe proper >>>> namespacing is the way to go here. Implementing it this way ensures >>>> backwards compatibility. On a broader thought, I think Ranger is lacking >>>> here. Context could also be provided by the client and there is no real >>>> clean way of doing this at the moment. >>> >>> Abhay> I will need to take a look to figure out why resource matcher will >>> not work. However, instead of implementing a new API (removeValue()), is >>> it possible to use setValue() API to set KEY_CLIENT_TAG entry to null? >> >> I don’t think that is possible. The resource matcher checks for elements >> and setting it to null means it is present which means the signature still >> doesn’t match. >> >>>> >>>> Question should client tags only apply to SELF, or also >>>> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here. >>> >>> Abhay> I don’t see any issue, at this time, to apply client-tags when >>> match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR. >> >> This means a client tag will match against all of them at any time. The >> client isn’t aware of match-types. Correct? >> >>>> >>>> Second question (a bit unrelated): how scaleable is the tagsync approach? >>>> If we have millions of tagged files and sources they all end up being >>>> registered in Ranger this could easily grow exponentially. Besides >>>> getting outdated? The other approach could be to have this handled in the >>>> client (pickup info from TagSource - ie. Atlas and supply this to the >>>> policy engine). >>> >>> Abhay> I see that there is some lag involved. But, overall, the >>> architecture allows for tag-based policies (really ABAC way of >>> authorization) to be applied across all components uniformly. Having >>> ranger-admin as a central repository of policies and tags, and components >>> as simply clients downloading these artifacts has many more advantages >>> than each component having to do all the work by itself. Also, any Kafka >>> delay will also be an issue even when components directly received tags >>> from Atlas without ranger-admin mediating tag transfer. Moreover, there >>> are several optimizations possible (such as incremental download of tags - >>> not implemented yet) which can speed up tag downloads significantly. With >>> a large number of tags, surely, the size of ranger-admin tag tables will >>> increase, but IMO, it is a fair trade-off considering all other advantages >>> this architecture provides us. Also, it will be useful to know the order >>> of magnitude of delay you experienced (other than possibly up to 1 minute >>> delay because of the interval between tag downloads). >> >> The one minute is already too much for us. The example I gave happens >> within a few milliseconds so basically any delay is not acceptable. >> >> To me it seems architecturally incorrect to have Ranger to be a source for >> tags as that is Atlas (or some other). Ranger is duplicating things here >> rather than sticking to what it is good at: policies. Clients are already >> downloading tags, doing that from Atlas instead of Ranger is not adding a >> lot of complexity and can be handled in the plugin transparently. But that >> is just my opinion. >> >> Maybe there is a possibility to accept client tags as a temporary in >> Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just >> thinking out loud. >> >>>> >>>> Cheers >>>> Bolke >>>> >>>> >>>> Verstuurd vanaf mijn iPad >>>> >>>>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni >>>>> <akulka...@hortonworks.com> het volgende geschreven: >>>>> >>>>> Hi Bolke, >>>>> >>>>> This looks like a good addition to tag-based authorization in Ranger. I >>>>> will review the patch separately. However, here are a few thoughts. >>>>> >>>>> 1. If the client component is tag-aware and client-supplied tags >>>>> overwrite >>>>> admin-supplied tags, audit needs to record this very clearly. This will >>>>> avoid any potential confusion about why the authorization decision was >>>>> different only for a certain (or certain type) of component. >>>>> >>>>> 2. Do the client-supplied tags have to be removed from the >>>>> access-request? >>>>> >>>>> Thanks, >>>>> -Abhay >>>>> >>>>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bdbr...@gmail.com> wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is >>>>>> dependent on a resource to tag mapping supplied externally by for >>>>>> example >>>>>> Apache Atlas. We found out that having tags available in Ranger can >>>>>> have >>>>>> a prohibitive delay. For example, data arrives at the platform and is >>>>>> being tagged programatically in Apache Atlas. Atlas then puts the data >>>>>> on >>>>>> Kafka and Ranger picks it up. The client (or another) needs to refresh >>>>>> its policies before the tagging info becomes available for evaluation. >>>>>> Typically, this can be too slow. Kafka introduces a lag and the policy >>>>>> refresh also introduces a lag (tested). >>>>>> >>>>>> If the client is tag aware and it could supply this information to the >>>>>> plugin policy evaluation could continue. I have created >>>>>> https://issues.apache.org/jira/browse/RANGER-2302 >>>>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I >>>>>> also >>>>>> have created an initial patch. The patch allows a client to set the >>>>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access >>>>>> request. This will then be picked up by the Tag Enricher. Currently, >>>>>> client supplied tags overwrite the system supplied tags. The reason for >>>>>> this is that the client might have more recent information. Most likely >>>>>> this will need to be checked against the ³updated² field in the tag >>>>>> itself, bit that wasn't readily available. >>>>>> >>>>>> I am looking for feedback to see if we can have this in. Or are there >>>>>> other ways to solve this? >>>>>> >>>>>> Cheers >>>>>> Bolke >>>>>> >>>>>> >>>>> >>>> >>> >> >> >>