Hi Abhay, Also answers inline.
B. Verstuurd vanaf mijn iPad > Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <akulka...@hortonworks.com> het > volgende geschreven: > > Hi Bolke, > > My comments inline. > > Thanks, > -Abhay > >> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bdbr...@gmail.com> wrote: >> >> Hi Abhay, >> >> Good point on #1 will take that into account if possible (can a enricher >> call audit events?). >> >> On #2 yes, otherwise the resource matcher will stop working. Maybe proper >> namespacing is the way to go here. Implementing it this way ensures >> backwards compatibility. On a broader thought, I think Ranger is lacking >> here. Context could also be provided by the client and there is no real >> clean way of doing this at the moment. > > Abhay> I will need to take a look to figure out why resource matcher will > not work. However, instead of implementing a new API (removeValue()), is > it possible to use setValue() API to set KEY_CLIENT_TAG entry to null? I don’t think that is possible. The resource matcher checks for elements and setting it to null means it is present which means the signature still doesn’t match. >> >> Question should client tags only apply to SELF, or also >> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here. > > Abhay> I don’t see any issue, at this time, to apply client-tags when > match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR. This means a client tag will match against all of them at any time. The client isn’t aware of match-types. Correct? >> >> Second question (a bit unrelated): how scaleable is the tagsync approach? >> If we have millions of tagged files and sources they all end up being >> registered in Ranger this could easily grow exponentially. Besides >> getting outdated? The other approach could be to have this handled in the >> client (pickup info from TagSource - ie. Atlas and supply this to the >> policy engine). > > Abhay> I see that there is some lag involved. But, overall, the > architecture allows for tag-based policies (really ABAC way of > authorization) to be applied across all components uniformly. Having > ranger-admin as a central repository of policies and tags, and components > as simply clients downloading these artifacts has many more advantages > than each component having to do all the work by itself. Also, any Kafka > delay will also be an issue even when components directly received tags > from Atlas without ranger-admin mediating tag transfer. Moreover, there > are several optimizations possible (such as incremental download of tags - > not implemented yet) which can speed up tag downloads significantly. With > a large number of tags, surely, the size of ranger-admin tag tables will > increase, but IMO, it is a fair trade-off considering all other advantages > this architecture provides us. Also, it will be useful to know the order > of magnitude of delay you experienced (other than possibly up to 1 minute > delay because of the interval between tag downloads). The one minute is already too much for us. The example I gave happens within a few milliseconds so basically any delay is not acceptable. To me it seems architecturally incorrect to have Ranger to be a source for tags as that is Atlas (or some other). Ranger is duplicating things here rather than sticking to what it is good at: policies. Clients are already downloading tags, doing that from Atlas instead of Ranger is not adding a lot of complexity and can be handled in the plugin transparently. But that is just my opinion. Maybe there is a possibility to accept client tags as a temporary in Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just thinking out loud. >> >> Cheers >> Bolke >> >> >> Verstuurd vanaf mijn iPad >> >>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni >>> <akulka...@hortonworks.com> het volgende geschreven: >>> >>> Hi Bolke, >>> >>> This looks like a good addition to tag-based authorization in Ranger. I >>> will review the patch separately. However, here are a few thoughts. >>> >>> 1. If the client component is tag-aware and client-supplied tags >>> overwrite >>> admin-supplied tags, audit needs to record this very clearly. This will >>> avoid any potential confusion about why the authorization decision was >>> different only for a certain (or certain type) of component. >>> >>> 2. Do the client-supplied tags have to be removed from the >>> access-request? >>> >>> Thanks, >>> -Abhay >>> >>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bdbr...@gmail.com> wrote: >>>> >>>> Hi All, >>>> >>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is >>>> dependent on a resource to tag mapping supplied externally by for >>>> example >>>> Apache Atlas. We found out that having tags available in Ranger can >>>> have >>>> a prohibitive delay. For example, data arrives at the platform and is >>>> being tagged programatically in Apache Atlas. Atlas then puts the data >>>> on >>>> Kafka and Ranger picks it up. The client (or another) needs to refresh >>>> its policies before the tagging info becomes available for evaluation. >>>> Typically, this can be too slow. Kafka introduces a lag and the policy >>>> refresh also introduces a lag (tested). >>>> >>>> If the client is tag aware and it could supply this information to the >>>> plugin policy evaluation could continue. I have created >>>> https://issues.apache.org/jira/browse/RANGER-2302 >>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I >>>> also >>>> have created an initial patch. The patch allows a client to set the >>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access >>>> request. This will then be picked up by the Tag Enricher. Currently, >>>> client supplied tags overwrite the system supplied tags. The reason for >>>> this is that the client might have more recent information. Most likely >>>> this will need to be checked against the ³updated² field in the tag >>>> itself, bit that wasn't readily available. >>>> >>>> I am looking for feedback to see if we can have this in. Or are there >>>> other ways to solve this? >>>> >>>> Cheers >>>> Bolke >>>> >>>> >>> >> >