Re: Allow clients to supply tag information

Bolke de Bruin Sun, 09 Dec 2018 07:28:00 -0800

Hi Don (apologies for the earlier misspelling),

I was a bit off in my analysis, I mixed client and plugin. getTags can 
definitely work, but I don’t think we should move everything to the “advanced” 
side of the plugin. In other words I think the TagEnricher should still do its 
thing.


I’m going to be AFK for awhile, but I’ll pick this after that.

B.

> On 7 Dec 2018, at 20:12, Bolke de Bruin <[email protected]> wrote:
> 
> Hi Dan,
> 
> Thanks for think along. Answers inline again.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 7 dec. 2018 om 19:42 heeft Don Bosco Durai <[email protected]> het 
>> volgende geschreven:
>> 
>> Hi Bolke
>> 
>> Thanks for the suggestion and contribution.
>> 
>> I am trying to understand your approach. Can you add your patch in Review 
>> Board. It will be easy to see the changes you have done visually.
> 
> Will do after the fixing the selection of which tag version takes precedence 
> (see below).
> 
>> 
>> I have a few questions and design suggestions.
>> 1. When you say "Client", are you referring to "Plugin Implementation Code" 
>> or "End User" (similar to Accumulo model)
> 
> Plugin. I don’t trust the user :-). If at least by plugin you mean agent 
> inside for example Hive.
> 
>> 2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good 
>> suggestion and we should support it. It is end-user, then it is a longer 
>> discussion
> 
> No long discussion then :-) 
> 
>> 3. Based on the discussion so far and reviewing the code change at high 
>> level, it seems you are extending at the Tag Enricher level. Alternatively, 
>> would it me more design friendly to provide a method/API in the plugin 
>> interface to return or override the tags. E.g.  getTags( request ). Custom 
>> plugins can override this method and alter the Tags to be returned. This 
>> might be more isolated and cleaner implementation, so the Plugin writers can 
>> only focus on their Plugin implementation.
> 
> 
> That could work. Taking that a bit further I was always thinking that 
> ‘context’ was something to be supplied by the client. It caught me by 
> surprise that a client (i.e. plugin) can actually set no additional context 
> at all. So instead of a getTags I suggest a “updateContext” with name spaced 
> keys. I think that is more future proof. However it won’t be backwards 
> compatible (this is why I choose to implement it as it is now). So when you 
> upgrade Ranger you would need to update all your clients at once. Adding 
> getTags would do the same. What do you think?
> 
> I’m now working out which tag takes precedence, e.g. a system supplied one or 
> a client supplied one. I will use the updatedTime field for this. This would 
> need somewhat more complexity than I would expect a custom plugin to handle. 
> 
> 
>> 4. If needed, for advanced users, we can provide an interface or API to 
>> implement their own Tag Sync. Which could be in addition to Atlas/Kafka or 
>> exclusive to their environment or Meta Store.
> 
> I suggest making it possible to have multiple syncs and be able to set the 
> order in which they should be evaluated and a hierarchy which one can 
> overwrite the other. But this is for later.
> 
>> 
>> Thanks
>> 
>> Bosco
>> 
>> 
>> 
>> On 12/5/18, 11:59 AM, "Bolke de Bruin" <[email protected]> wrote:
>> 
>>   Hi Abhay,
>> 
>>   Also answers inline.
>> 
>>   B.
>> 
>>   Verstuurd vanaf mijn iPad
>> 
>>> Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <[email protected]> 
>>> het volgende geschreven:
>>> 
>>> Hi Bolke,
>>> 
>>> My comments inline.
>>> 
>>> Thanks,
>>> -Abhay
>>> 
>>>> On 12/4/18, 1:07 PM, "Bolke de Bruin" <[email protected]> wrote:
>>>> 
>>>> Hi Abhay,
>>>> 
>>>> Good point on #1 will take that into account if possible (can a enricher
>>>> call audit events?).
>>>> 
>>>> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
>>>> namespacing is the way to go here. Implementing it this way ensures
>>>> backwards compatibility. On a broader thought, I think Ranger is lacking
>>>> here. Context could also be provided by the client and there is no real
>>>> clean way of doing this at the moment.
>>> 
>>> Abhay> I will need to take a look to figure out why resource matcher will
>>> not work. However, instead of implementing a new API (removeValue()), is
>>> it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
>> 
>>   I don’t think that is possible. The resource matcher checks for elements 
>> and setting it to null means it is present which means the signature still 
>> doesn’t match.
>> 
>>>> 
>>>> Question should client tags only apply to SELF, or also
>>>> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
>>> 
>>> Abhay> I don’t see any issue, at this time, to apply client-tags when
>>> match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
>> 
>>   This means a client tag will match against all of them at any time. The 
>> client isn’t aware of match-types. Correct?
>> 
>>>> 
>>>> Second question (a bit unrelated): how scaleable is the tagsync approach?
>>>> If we have millions of tagged files and sources they all end up being
>>>> registered in Ranger this could easily grow exponentially. Besides
>>>> getting outdated? The other approach could be to have this handled in the
>>>> client (pickup info from TagSource - ie. Atlas and supply this to the
>>>> policy engine).
>>> 
>>> Abhay> I see that there is some lag involved. But, overall, the
>>> architecture allows for tag-based policies (really ABAC way of
>>> authorization) to be applied across all components uniformly. Having
>>> ranger-admin as a central repository of policies and tags, and components
>>> as simply clients downloading these artifacts has many more advantages
>>> than each component having to do all the work by itself. Also, any Kafka
>>> delay will also be an issue even when components directly received tags
>>> from Atlas without ranger-admin mediating tag transfer. Moreover, there
>>> are several optimizations possible (such as incremental download of tags -
>>> not implemented yet) which can speed up tag downloads significantly. With
>>> a large number of tags, surely, the size of ranger-admin tag tables will
>>> increase, but IMO, it is a fair trade-off considering all other advantages
>>> this architecture provides us. Also, it will be useful to know the order
>>> of magnitude of delay you experienced (other than possibly up to 1 minute
>>> delay because of the interval between tag downloads).
>> 
>>   The one minute is already too much for us. The example I gave happens 
>> within a few milliseconds so basically any delay is not acceptable.
>> 
>>   To me it seems architecturally incorrect to have Ranger to be a source for 
>> tags as that is  Atlas (or some other). Ranger is duplicating things here 
>> rather than sticking to what it is good at: policies.  Clients are already 
>> downloading tags, doing that from Atlas instead of Ranger is not adding a 
>> lot of complexity and can be handled in the plugin transparently. But that 
>> is just my opinion. 
>> 
>>   Maybe there is a possibility to accept client tags as a temporary in 
>> Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just 
>> thinking out loud.
>> 
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>> 
>>>> Verstuurd vanaf mijn iPad
>>>> 
>>>>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
>>>>> <[email protected]> het volgende geschreven:
>>>>> 
>>>>> Hi Bolke, 
>>>>> 
>>>>> This looks like a good addition to tag-based authorization in Ranger. I
>>>>> will review the patch separately. However, here are a few thoughts.
>>>>> 
>>>>> 1. If the client component is tag-aware and client-supplied tags
>>>>> overwrite
>>>>> admin-supplied tags, audit needs to record this very clearly. This will
>>>>> avoid any potential confusion about why the authorization decision was
>>>>> different only for a certain (or certain type) of component.
>>>>> 
>>>>> 2. Do the client-supplied tags have to be removed from the
>>>>> access-request?
>>>>> 
>>>>> Thanks,
>>>>> -Abhay
>>>>> 
>>>>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <[email protected]> wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>>>>>> dependent on a resource to tag mapping supplied externally by for
>>>>>> example
>>>>>> Apache Atlas. We found out that having tags available in Ranger can
>>>>>> have
>>>>>> a prohibitive delay. For example, data arrives at the platform and is
>>>>>> being tagged programatically in Apache Atlas. Atlas then puts the data
>>>>>> on
>>>>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>>>>>> its policies before the tagging info becomes available for evaluation.
>>>>>> Typically, this can be too slow. Kafka introduces a lag and the policy
>>>>>> refresh also introduces a lag (tested).
>>>>>> 
>>>>>> If the client is tag aware and it could supply this information to the
>>>>>> plugin policy evaluation could continue. I have created
>>>>>> https://issues.apache.org/jira/browse/RANGER-2302
>>>>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
>>>>>> also
>>>>>> have created an initial patch. The patch allows a client to set the
>>>>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>>>>>> request. This will then be picked up by the Tag Enricher. Currently,
>>>>>> client supplied tags overwrite the system supplied tags. The reason for
>>>>>> this is that the client might have more recent information. Most likely
>>>>>> this will need to be checked against the ³updated² field in the tag
>>>>>> itself, bit that wasn't readily available.
>>>>>> 
>>>>>> I am looking for feedback to see if we can have this in. Or are there
>>>>>> other ways to solve this?
>>>>>> 
>>>>>> Cheers
>>>>>> Bolke
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>>

Re: Allow clients to supply tag information

Reply via email to