Hi Bolke

Thanks for the suggestion and contribution.

I am trying to understand your approach. Can you add your patch in Review 
Board. It will be easy to see the changes you have done visually.

I have a few questions and design suggestions.
1. When you say "Client", are you referring to "Plugin Implementation Code" or 
"End User" (similar to Accumulo model)
2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good 
suggestion and we should support it. It is end-user, then it is a longer 
discussion
3. Based on the discussion so far and reviewing the code change at high level, 
it seems you are extending at the Tag Enricher level. Alternatively, would it 
me more design friendly to provide a method/API in the plugin interface to 
return or override the tags. E.g.  getTags( request ). Custom plugins can 
override this method and alter the Tags to be returned. This might be more 
isolated and cleaner implementation, so the Plugin writers can only focus on 
their Plugin implementation.
4. If needed, for advanced users, we can provide an interface or API to 
implement their own Tag Sync. Which could be in addition to Atlas/Kafka or 
exclusive to their environment or Meta Store.

Thanks

Bosco



On 12/5/18, 11:59 AM, "Bolke de Bruin" <bdbr...@gmail.com> wrote:

    Hi Abhay,
    
    Also answers inline.
    
    B.
    
    Verstuurd vanaf mijn iPad
    
    > Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <akulka...@hortonworks.com> 
het volgende geschreven:
    > 
    > Hi Bolke,
    > 
    > My comments inline.
    > 
    > Thanks,
    > -Abhay
    > 
    >> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bdbr...@gmail.com> wrote:
    >> 
    >> Hi Abhay,
    >> 
    >> Good point on #1 will take that into account if possible (can a enricher
    >> call audit events?).
    >> 
    >> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
    >> namespacing is the way to go here. Implementing it this way ensures
    >> backwards compatibility. On a broader thought, I think Ranger is lacking
    >> here. Context could also be provided by the client and there is no real
    >> clean way of doing this at the moment.
    > 
    > Abhay> I will need to take a look to figure out why resource matcher will
    > not work. However, instead of implementing a new API (removeValue()), is
    > it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
    
    I don’t think that is possible. The resource matcher checks for elements 
and setting it to null means it is present which means the signature still 
doesn’t match.
    
    >> 
    >> Question should client tags only apply to SELF, or also
    >> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
    > 
    > Abhay> I don’t see any issue, at this time, to apply client-tags when
    > match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
    
    This means a client tag will match against all of them at any time. The 
client isn’t aware of match-types. Correct?
    
    >> 
    >> Second question (a bit unrelated): how scaleable is the tagsync approach?
    >> If we have millions of tagged files and sources they all end up being
    >> registered in Ranger this could easily grow exponentially. Besides
    >> getting outdated? The other approach could be to have this handled in the
    >> client (pickup info from TagSource - ie. Atlas and supply this to the
    >> policy engine).
    > 
    > Abhay> I see that there is some lag involved. But, overall, the
    > architecture allows for tag-based policies (really ABAC way of
    > authorization) to be applied across all components uniformly. Having
    > ranger-admin as a central repository of policies and tags, and components
    > as simply clients downloading these artifacts has many more advantages
    > than each component having to do all the work by itself. Also, any Kafka
    > delay will also be an issue even when components directly received tags
    > from Atlas without ranger-admin mediating tag transfer. Moreover, there
    > are several optimizations possible (such as incremental download of tags -
    > not implemented yet) which can speed up tag downloads significantly. With
    > a large number of tags, surely, the size of ranger-admin tag tables will
    > increase, but IMO, it is a fair trade-off considering all other advantages
    > this architecture provides us. Also, it will be useful to know the order
    > of magnitude of delay you experienced (other than possibly up to 1 minute
    > delay because of the interval between tag downloads).
    
    The one minute is already too much for us. The example I gave happens 
within a few milliseconds so basically any delay is not acceptable.
    
    To me it seems architecturally incorrect to have Ranger to be a source for 
tags as that is  Atlas (or some other). Ranger is duplicating things here 
rather than sticking to what it is good at: policies.  Clients are already 
downloading tags, doing that from Atlas instead of Ranger is not adding a lot 
of complexity and can be handled in the plugin transparently. But that is just 
my opinion. 
    
    Maybe there is a possibility to accept client tags as a temporary in Ranger 
that can then be overwritten by the Tag Store (ie. Atlas). Just thinking out 
loud.
    
    >> 
    >> Cheers
    >> Bolke
    >> 
    >> 
    >> Verstuurd vanaf mijn iPad
    >> 
    >>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
    >>> <akulka...@hortonworks.com> het volgende geschreven:
    >>> 
    >>> Hi Bolke, 
    >>> 
    >>> This looks like a good addition to tag-based authorization in Ranger. I
    >>> will review the patch separately. However, here are a few thoughts.
    >>> 
    >>> 1. If the client component is tag-aware and client-supplied tags
    >>> overwrite
    >>> admin-supplied tags, audit needs to record this very clearly. This will
    >>> avoid any potential confusion about why the authorization decision was
    >>> different only for a certain (or certain type) of component.
    >>> 
    >>> 2. Do the client-supplied tags have to be removed from the
    >>> access-request?
    >>> 
    >>> Thanks,
    >>> -Abhay
    >>> 
    >>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bdbr...@gmail.com> wrote:
    >>>> 
    >>>> Hi All,
    >>>> 
    >>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
    >>>> dependent on a resource to tag mapping supplied externally by for
    >>>> example
    >>>> Apache Atlas. We found out that having tags available in Ranger can
    >>>> have
    >>>> a prohibitive delay. For example, data arrives at the platform and is
    >>>> being tagged programatically in Apache Atlas. Atlas then puts the data
    >>>> on
    >>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
    >>>> its policies before the tagging info becomes available for evaluation.
    >>>> Typically, this can be too slow. Kafka introduces a lag and the policy
    >>>> refresh also introduces a lag (tested).
    >>>> 
    >>>> If the client is tag aware and it could supply this information to the
    >>>> plugin policy evaluation could continue. I have created
    >>>> https://issues.apache.org/jira/browse/RANGER-2302
    >>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
    >>>> also
    >>>> have created an initial patch. The patch allows a client to set the
    >>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
    >>>> request. This will then be picked up by the Tag Enricher. Currently,
    >>>> client supplied tags overwrite the system supplied tags. The reason for
    >>>> this is that the client might have more recent information. Most likely
    >>>> this will need to be checked against the ³updated² field in the tag
    >>>> itself, bit that wasn't readily available.
    >>>> 
    >>>> I am looking for feedback to see if we can have this in. Or are there
    >>>> other ways to solve this?
    >>>> 
    >>>> Cheers
    >>>> Bolke
    >>>> 
    >>>> 
    >>> 
    >> 
    > 
    


Reply via email to