Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for 
change notification.

The "AuthzImprovements" page has been changed by StefanFuhrmann:
https://wiki.apache.org/subversion/AuthzImprovements

Comment:
Initial version.

New page:
= Improved authz =

== Goals ==

The current (1.9) implementation of authz is lacking in two areas:

 * Performance when a large number of paths needs to be checked.
 Affected operations are mainly checkout / export and log.

 * Support for wildcards.
 Subversion should support "*" or single, arbitrary path segments
 (with no "/" in them) and "**" for arbitrary length non-empty paths.
 Finally, classic wildcard patterns like "*foo*.bar" shall be supported.
 All wildcard usage applies to full path segments only, i.e. a '*'
 never matches a '/' except for the case of "/**/" where it matches
 one to many full segments.


== Terminology ==

A '''path''' consists of '''segments''', separated by "/".
Whildcards are valid segments and their interpretation depends
on the context.

A '''rule set''' contains a number of '''rules''', each
describing what '''users''' shall have which '''access rights'''.
Users might be specified indirectly by using '''groups'''.

A '''path rule''' specifies what rule set applies to a given path.


== Inheritance and disambiguation ==

These are the rules as of today, with rule 3 added to support
ambiguous path patches caused by pattern matching.

 1. Path rules implicitly apply to all paths in the respective
 sub-tree.

 2. If multiple path rules match a given repository path, only
 the path rule(s) that cover the most segments shall apply.

 3. If multiple path rules cover the same number of segments,
 only the one specified last in the authz file shall apply.

 4. If there are multiple rules in a rule set for a given
 user, the union of all rights granted shall apply.

 5. The following rule is implicitly added as the first rule
 given no access to anybody by default.
{{{
[/]
* =
}}}


== Design ==

The idea is to use combine the following approaches:

 * Preprocessed, tree-like data structures applied to segmented paths
 * Reduction of the tree to what applies to the current user
 * Pre-calculate recursive rights for early exist

In the initial implementation we may rely on the svn_config API
to provide us with parser functionality.  At that stage, we don't
need an independent representation of the whole file contents.


=== General workflow ===

Putting caching aside, the workflow involved three data models,
building on top of each other.

 * Parsing an authz file (from file system or repository)
 yields a consolidated hash (additive sections being combined
 automatically) of it contents in svn_config_t.

 * Filtered path rule tree
  * prefix tree with one node per segment
  * created on demand per user and repository
  * contains only rules that apply to the respective user and repository
  * multiple instances of that being cached in the svn_authz_t structure 
alongside the single svn_config_t
  * rule sets being reduced to access rights + order ID
  * each node knows min / max rights on all sub-nodes

 * Lookup state
  * access rights accordingly the latest matching path rule
  * list of tree nodes that may match sub-paths as we may need to follow 
multiple patterns
  * temporary data structure thats reused between queries to save on allocation 
and construction overhead


=== Data models ===

These are persistent in the sense that we will cache and reuse
them.  They do not cover transient data models that various
algorithms may use e.g. during authz parsing.

Unless indicated otherwise, all collections are ordered.

Only abstract types / the interface view is given here; the actual
implementation may be different.

==== Filtered path rule tree ====
{{{
filtered-tree :=
        root      : filtered-node       // exists due to default path rule

filtered-node :=
        segment   : string              // empty for root
        access    : ref to access-type  // empty if no path ends here
        min-rights: none | r | w | rw   // user's minimum rights on any path in 
the sub-tree
        max-rights: none | r | w | rw   // user's maximum rights on any path in 
the sub-tree
        repeat    : boolean             // set on nodes for "**" segments

        sub-nodes : { map following segment => filtered-node }[0..*]
                                        // one node per non-wildcard in 
following segment
                                        // empty if no rules on following 
segments

        // the following will be aggregated into a sub-structure to
        // facilitate a quick "has pattern sub-segment?" check
        any       : filtered-node       // empty if no path rule with "*" in 
following segment
        any-var   : filtered-node       // empty if no path rule with "**" in 
following segment
        prefixed  : { list of filtered-node }[0..*]
                                        // empty if no path rule like "bar*" in 
following segment
        suffixed  : { list of filtered-node }[0..*]
                                        // empty if no path rule like "*bar" in 
following segment
        complex   : { list of filtered-node }[0..*]
                                        // empty if no path rule with complex 
wildcard pattern
                                        // like "*for*bar" in following segment

access-type :=
        order-id  : integer             // increases with declaration order in 
the file
        rights    : none | r | w | rw
}}}

==== Lookup state ====
{{{
lookup-state :=
        access    : ref to access-type  // user's rights for this path
        min-rights: none | r | w | rw   // user's minimum rights on any path in 
the sub-tree
                                        // (aggregated over RIGHTS and all 
nodes in CURRENT)
        max-rights: none | r | w | rw   // user's maximum rights on any path in 
the sub-tree
                                        // (aggregated over RIGHTS and all 
nodes in CURRENT)
        current   : { ref to filtered-node }[0..*]
                                        // sub nodes of these may match the 
next segment
}}}
== Algorithms ==

=== Normalization ===

Wildcard sequences in paths in rule sets shall be normalized.
This is merely done to reduce matching costs later on.
{{{
        // Variable length wildcards must be the last in a sequence
        while path contains "/**/*/"
                replace occurrence with "/*/**/"

        // Consecutive variable length wildcards are redundant
        while path contains "/**/**/"
                replace occurrence with "/*/**/"

        // Trailing variable length wildcards are redundant
        if path ends with "/**"
                replace occurrence with "/*"

        // "**" within segment patterns are redundant
        while path contains "**" not immediately preceded by "/"
                replace occurrence with "*"
        while path contains "**" not immediately followed by "/"
                replace occurrence with "*"
}}}
=== Lookup ===

Since there is always an implicit path rule at the root for
all users, lookup is always necessary.
{{{
        init(tree : ref to filtered-tree):
                return new lookup-state
                        .rights = tree.root.rule
                        .current = { tree.root }, if has_subnodes(tree.root)
                                   { }, otherwise
}}}
Checking whether we need to continue in our path matching is trivial,
just check whether there are any potential matches to sub-paths.
{{{
        done(state : ref to lookup-state):
                return is_empty(state.current)
}}}
One step in the lookup iteration, i.e. matching the next segment,
is relatively simple as well.  For simplicity, we ignore the min / max
rights optimization here.
{{{
        latest(lhs : ref to access-type, rhs : ref to access-type):
                if is_empty(lhs)
                        return rhs
                if is_empty(rhs) or lhs.order-id > rhs.order-id
                        return lhs
                return rhs

        step(state : ref to lookup-state, segment : string):
                next = { }
                access = empty
                foreach node in state.current:
                        if node.repeat:
                                next += node
                                access = latest(access, node.access)
                        foreach sub-node in all-subnodes(node): // simplified
                                if matches(sub-node, segment):  // simplified
                                        next += sub-node
                                        access = latest(access, sub-node.access)

                if not is_empty(access)
                        state.rights = access
                state.current = next
}}}
So, the full lookup without caching is
{{{
        lookup(tree : ref to filtered-tree, path : string):
        state = init(tree)
        foreach segment in path
                if done(state)
                        break;
                state = step(state, segment)
        return state.rights
}}}

Reply via email to