Re: Allow clients to supply tag information

Bolke de Bruin Fri, 07 Dec 2018 11:12:55 -0800

Hi Dan,

Thanks for think along. Answers inline again.


B.

Verstuurd vanaf mijn iPad

> Op 7 dec. 2018 om 19:42 heeft Don Bosco Durai <[email protected]> het volgende 
> geschreven:
> 
> Hi Bolke
> 
> Thanks for the suggestion and contribution.
> 
> I am trying to understand your approach. Can you add your patch in Review 
> Board. It will be easy to see the changes you have done visually.

Will do after the fixing the selection of which tag version takes precedence 
(see below).

> 
> I have a few questions and design suggestions.
> 1. When you say "Client", are you referring to "Plugin Implementation Code" 
> or "End User" (similar to Accumulo model)

Plugin. I don’t trust the user :-). If at least by plugin you mean agent inside 
for example Hive.

> 2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good 
> suggestion and we should support it. It is end-user, then it is a longer 
> discussion

No long discussion then :-) 

> 3. Based on the discussion so far and reviewing the code change at high 
> level, it seems you are extending at the Tag Enricher level. Alternatively, 
> would it me more design friendly to provide a method/API in the plugin 
> interface to return or override the tags. E.g.  getTags( request ). Custom 
> plugins can override this method and alter the Tags to be returned. This 
> might be more isolated and cleaner implementation, so the Plugin writers can 
> only focus on their Plugin implementation.


That could work. Taking that a bit further I was always thinking that ‘context’ 
was something to be supplied by the client. It caught me by surprise that a 
client (i.e. plugin) can actually set no additional context at all. So instead 
of a getTags I suggest a “updateContext” with name spaced keys. I think that is 
more future proof. However it won’t be backwards compatible (this is why I 
choose to implement it as it is now). So when you upgrade Ranger you would need 
to update all your clients at once. Adding getTags would do the same. What do 
you think?

I’m now working out which tag takes precedence, e.g. a system supplied one or a 
client supplied one. I will use the updatedTime field for this. This would need 
somewhat more complexity than I would expect a custom plugin to handle. 


> 4. If needed, for advanced users, we can provide an interface or API to 
> implement their own Tag Sync. Which could be in addition to Atlas/Kafka or 
> exclusive to their environment or Meta Store.

I suggest making it possible to have multiple syncs and be able to set the 
order in which they should be evaluated and a hierarchy which one can overwrite 
the other. But this is for later.

> 
> Thanks
> 
> Bosco
> 
> 
> 
> On 12/5/18, 11:59 AM, "Bolke de Bruin" <[email protected]> wrote:
> 
>    Hi Abhay,
> 
>    Also answers inline.
> 
>    B.
> 
>    Verstuurd vanaf mijn iPad
> 
>> Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <[email protected]> het 
>> volgende geschreven:
>> 
>> Hi Bolke,
>> 
>> My comments inline.
>> 
>> Thanks,
>> -Abhay
>> 
>>> On 12/4/18, 1:07 PM, "Bolke de Bruin" <[email protected]> wrote:
>>> 
>>> Hi Abhay,
>>> 
>>> Good point on #1 will take that into account if possible (can a enricher
>>> call audit events?).
>>> 
>>> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
>>> namespacing is the way to go here. Implementing it this way ensures
>>> backwards compatibility. On a broader thought, I think Ranger is lacking
>>> here. Context could also be provided by the client and there is no real
>>> clean way of doing this at the moment.
>> 
>> Abhay> I will need to take a look to figure out why resource matcher will
>> not work. However, instead of implementing a new API (removeValue()), is
>> it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
> 
>    I don’t think that is possible. The resource matcher checks for elements 
> and setting it to null means it is present which means the signature still 
> doesn’t match.
> 
>>> 
>>> Question should client tags only apply to SELF, or also
>>> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
>> 
>> Abhay> I don’t see any issue, at this time, to apply client-tags when
>> match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
> 
>    This means a client tag will match against all of them at any time. The 
> client isn’t aware of match-types. Correct?
> 
>>> 
>>> Second question (a bit unrelated): how scaleable is the tagsync approach?
>>> If we have millions of tagged files and sources they all end up being
>>> registered in Ranger this could easily grow exponentially. Besides
>>> getting outdated? The other approach could be to have this handled in the
>>> client (pickup info from TagSource - ie. Atlas and supply this to the
>>> policy engine).
>> 
>> Abhay> I see that there is some lag involved. But, overall, the
>> architecture allows for tag-based policies (really ABAC way of
>> authorization) to be applied across all components uniformly. Having
>> ranger-admin as a central repository of policies and tags, and components
>> as simply clients downloading these artifacts has many more advantages
>> than each component having to do all the work by itself. Also, any Kafka
>> delay will also be an issue even when components directly received tags
>> from Atlas without ranger-admin mediating tag transfer. Moreover, there
>> are several optimizations possible (such as incremental download of tags -
>> not implemented yet) which can speed up tag downloads significantly. With
>> a large number of tags, surely, the size of ranger-admin tag tables will
>> increase, but IMO, it is a fair trade-off considering all other advantages
>> this architecture provides us. Also, it will be useful to know the order
>> of magnitude of delay you experienced (other than possibly up to 1 minute
>> delay because of the interval between tag downloads).
> 
>    The one minute is already too much for us. The example I gave happens 
> within a few milliseconds so basically any delay is not acceptable.
> 
>    To me it seems architecturally incorrect to have Ranger to be a source for 
> tags as that is  Atlas (or some other). Ranger is duplicating things here 
> rather than sticking to what it is good at: policies.  Clients are already 
> downloading tags, doing that from Atlas instead of Ranger is not adding a lot 
> of complexity and can be handled in the plugin transparently. But that is 
> just my opinion. 
> 
>    Maybe there is a possibility to accept client tags as a temporary in 
> Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just 
> thinking out loud.
> 
>>> 
>>> Cheers
>>> Bolke
>>> 
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
>>>> <[email protected]> het volgende geschreven:
>>>> 
>>>> Hi Bolke, 
>>>> 
>>>> This looks like a good addition to tag-based authorization in Ranger. I
>>>> will review the patch separately. However, here are a few thoughts.
>>>> 
>>>> 1. If the client component is tag-aware and client-supplied tags
>>>> overwrite
>>>> admin-supplied tags, audit needs to record this very clearly. This will
>>>> avoid any potential confusion about why the authorization decision was
>>>> different only for a certain (or certain type) of component.
>>>> 
>>>> 2. Do the client-supplied tags have to be removed from the
>>>> access-request?
>>>> 
>>>> Thanks,
>>>> -Abhay
>>>> 
>>>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <[email protected]> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>>>>> dependent on a resource to tag mapping supplied externally by for
>>>>> example
>>>>> Apache Atlas. We found out that having tags available in Ranger can
>>>>> have
>>>>> a prohibitive delay. For example, data arrives at the platform and is
>>>>> being tagged programatically in Apache Atlas. Atlas then puts the data
>>>>> on
>>>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>>>>> its policies before the tagging info becomes available for evaluation.
>>>>> Typically, this can be too slow. Kafka introduces a lag and the policy
>>>>> refresh also introduces a lag (tested).
>>>>> 
>>>>> If the client is tag aware and it could supply this information to the
>>>>> plugin policy evaluation could continue. I have created
>>>>> https://issues.apache.org/jira/browse/RANGER-2302
>>>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
>>>>> also
>>>>> have created an initial patch. The patch allows a client to set the
>>>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>>>>> request. This will then be picked up by the Tag Enricher. Currently,
>>>>> client supplied tags overwrite the system supplied tags. The reason for
>>>>> this is that the client might have more recent information. Most likely
>>>>> this will need to be checked against the ³updated² field in the tag
>>>>> itself, bit that wasn't readily available.
>>>>> 
>>>>> I am looking for feedback to see if we can have this in. Or are there
>>>>> other ways to solve this?
>>>>> 
>>>>> Cheers
>>>>> Bolke
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
>

Re: Allow clients to supply tag information

Reply via email to