On Mon, Feb 4, 2013 at 7:59 PM, Julian Foad <julianf...@btopenworld.com>wrote:

> Hi Stefan.
>
> > URL: http://svn.apache.org/viewvc?rev=1442088&view=rev
> > Log:
> > Improve DAG node / path lookup performance in FSFS.  Since this is a hot
> > path in many fs_repo-based functions (like fs_verify), 20 .. 30% time
> > saved here can easily become significant.
> >
> > The idea here is that get_dag() already has knowledge about the PATH
> > parameter and does not actually use all the result data returned by
> > open_path(). So, define a number of flags that allow a caller to tell
> > open_path() where it may take short-cuts based on the context.
> >
> > * subversion/libsvn_fs_fs/tree.c
> >   (dag_node_cache_set): add comment to document a rationale
> >   (enum open_path_flags_t): declare further flags
> >   (open_path): support the new flags to short-circuit portions of code
> >   (get_dag,
> >    fs_closest_copy,
> >    get_mergeinfo_for_path_internal): provide flags to open_path()
>
>
> > +  /* When this flag is set, don't bother to lookup the DAG node in
> > +     our caches because we already tried this.  Ignoring this flag
> > +     has no functional impact.  */
> > +  open_path_uncached = 2,
> > +
> > +  /* Assume that the path parameter is already in canonical form. */
> > +  open_path_is_canonical = 4,
>
> I don't like this 'is_canonical' flag.  In the rest of Subversion, nearly
> all APIs assume paths are canonical already, and for a good reason:
> re-canonicalizing at a lower level tends to be expensive because it happens
> more times and with less knowledge of whether it's needed.  I think it
> would be better (including faster) to simply require that the path is
> canonical, and make sure the callers honour that.
>
> I see that many of the functions in this file currently appear to be
> accepting and dealing with non-canonical paths, but I think the way to go
> is to change that.
>

Turned out not to be too much work to fix all callers.

> +  /* The caller does not care about the parent node chain but only
> > +     the final DAG node. */
> > +  open_path_node_only = 8
> > } open_path_flags_t;
> >
> >
> > @@ -871,6 +885,10 @@ typedef enum open_path_flags_t {
> >     callers that create new nodes --- we find the parent directory for
> >     them, and tell them whether the entry exists already.
> >
> > +   The remaining bits in FLAGS are hints that allow this function
> > +   to take shortcuts based on knowledge that the caller provides,
> > +   such as the fact that PATH is already in canonical form.
>
> Maybe give a different example.
>

Done.


> >     NOTE: Public interfaces which only *read* from the filesystem
> >     should not call this function directly, but should instead use
> >     get_dag().
> > @@ -884,23 +902,47 @@ open_path(parent_path_t **parent_path_p,
> >            apr_pool_t *pool)
> > {
> >    svn_fs_t *fs = root->fs;
> > -  dag_node_t *here; /* The directory we're currently looking at.  */
> > -  parent_path_t *parent_path; /* The path from HERE up to the root.  */
> > +  dag_node_t *here = NULL; /* The directory we're currently looking
> at.  */
> > +  parent_path_t *parent_path; /* The path from HERE up to the root. */
> >    const char *rest; /* The portion of PATH we haven't traversed yet.  */
> > -  const char *canon_path = svn_fs__is_canonical_abspath(path)
> > +
> > +  /* ensure a canonical path representation */
> > +  const char *canon_path = (   (flags & open_path_is_canonical)
> > +                            || svn_fs__is_canonical_abspath(path))
> >                           ? path
> >                           : svn_fs__canonicalize_abspath(path, pool);
>
> Here you seem also to have decided that when canonicalizing, it's more
> efficient to test with svn_fs__is_canonical_abspath() before calling
> svn_fs__canonicalize_abspath().  Are you sure that is true?  If it is true
> here (and I see you already have the same pattern in one of the callers),
> then perhaps it is true in general, so why not do it inside
> svn_fs__canonicalize_abspath() so all callers benefit?
>

It's now all inside svn_fs__canonicalize_abspath, which has now
subtly changed semantics. But all existing callers should be fine
with that.

> @@ -1142,7 +1195,9 @@ get_dag(dag_node_t **dag_node_p,
> >          {
> >            /* Call open_path with no flags, as we want this to return an
> >             * error if the node for which we are searching doesn't
> exist. */
>
> The comment was falsified by your change and needs updating.  (There are
> similar comments on open_path() calls elsewhere in the file, which probably
> should be updated in the same way even though those calls are still passing
> no flags.)
>
> > -          SVN_ERR(open_path(&parent_path, root, path, 0, NULL, pool));
> > +          SVN_ERR(open_path(&parent_path, root, path,
> > +                            open_path_uncached | open_path_is_canonical
> > +                            | open_path_node_only, NULL, pool));
>
>
> (This is the only significant place where the 'is_canonical' flag is used.)
>
> >            node = parent_path->node;
> >
> >            /* No need to cache our find -- open_path() will do that for
> us. */
> > @@ -3162,7 +3217,7 @@ static svn_error_t *fs_closest_copy(svn_
> >    if (kind == svn_node_none)
> >      return SVN_NO_ERROR;
> >    SVN_ERR(open_path(&copy_dst_parent_path, copy_dst_root, path,
> > -                    0, NULL, pool));
> > +                    open_path_node_only, NULL, pool));
> >    copy_dst_node = copy_dst_parent_path->node;
> >    if (!
> svn_fs_fs__id_check_related(svn_fs_fs__dag_get_id(copy_dst_node),
> >
> > svn_fs_fs__dag_get_id(parent_path->node)))
> > @@ -3775,7 +3830,8 @@ get_mergeinfo_for_path_internal(svn_merg
> >
> >    path = svn_fs__canonicalize_abspath(path, scratch_pool);
> >
> > -  SVN_ERR(open_path(&parent_path, rev_root, path, 0, NULL,
> scratch_pool));
> > +  SVN_ERR(open_path(&parent_path, rev_root, path,
> open_path_is_canonical,
> > +                    NULL, scratch_pool));
>
>
> Here you've left a plain 'canonicalize' in the caller, even though you
> appear to think that the version inside open_path() is more efficient.
> (The result of this 'canonicalize' here is *only* passed to open_path().)
>
>
> >    if (inherit == svn_mergeinfo_nearest_ancestor && !
> > parent_path->parent)
> >      return SVN_NO_ERROR;
>
>
> - Julian
>
>
Changes committed in r1448810 with a follow-up in r1448820.

Thanks for the review!

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Reply via email to