On Thu, Nov 15, 2012 at 6:16 PM, Julian Foad <julianf...@btopenworld.com>wrote:

> Hi Stefan.  Some review questions and comments...
>
> stef...@apache.org wrote:
> > URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
> > Log:
> > Due to public request: apply rep-sharing to equal data reps within
> > the same transaction.
> >
> > The idea is simple. When writing a noderev to the txn folder,
> > write another file named by the rep's SHA1 and store the rep
> > struct in there. Lookup is then straight-forward.
>
> Please
>  could you update the FSFS structure document.  I assume it says what
> files are stored in the txn dir; if it doesn't it should.
>

Oops - hadn't thought about that! Done in r1411202.
All other changes in r1411209.

> * subversion/libsvn_fs_fs/fs_fs.c
> >   (svn_fs_fs__put_node_revision): also look for SHA1-named files
> >   (get_shared_rep): write SHA1-named files
> [...]
> > @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f
> >
> > -  return svn_io_file_close(noderev_file, pool);
> > +  SVN_ERR(svn_io_file_close(noderev_file, pool));
> > +
> > +  /* if rep sharing has been enabled and the noderev has a data rep
> > +   * and its SHA-1 is known, store the rep struct under its SHA1. */
>
> It
>  looks like we re-enter this 'if' block, and re-write the rep-reference
> file, every time we store a node-rev with the same representation --
> even if the rep we're storing was already found from this file.  This
> might not be much of an overhead, but it seems ugly.
>
> Could we
> avoid re-writing it in those cases?  Maybe by refactoring such that we
> write the rep-reference file immediately after writing the rep, instead
> of here in put_node_revision.
>

Yep. It turns out that rep_write_contents_close is the
only relevant user of that functionality. There, we can
also decide whether to store the sha1->rep mapping
or not. It also addresses the locking problem below.


> > +  if (sha1)
> > +   {
> > +    apr_file_t *rep_file;
> > +    const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id,\
> > +                                            pool), sha1, pool);
>
> Creating
>  the file name ('checksum_to_cstring' + 'path_txn_dir' +
> 'svn_dirent_join') is common to both this function and the function
> below where we read the file.  Factor it out, like the existing
> path_txn_node_rev() and similar functions, to make the commonality
> clear.
>

Done.


> > +    const char *rep_string = representation_string(noderev->data_rep,
> > +                                                   ffd->format,
> > +                                                   (noderev->kind
> > +                                                    == svn_node_dir),
> > +                                                   FALSE,
> > +                                                   pool);
> > +    SVN_ERR(svn_io_file_open(&rep_file, file_name,
> > +                             APR_WRITE | APR_CREATE | APR_TRUNCATE
> > +                             | APR_BUFFERED, APR_OS_DEFAULT, pool));
> > +
> > +    SVN_ERR(svn_io_file_write_full(rep_file, rep_string,
> > +                                   strlen(rep_string), NULL, pool));
> > +
> > +    SVN_ERR(svn_io_file_close(rep_file, pool));
> > +   }
> > +
> > +  return SVN_NO_ERROR;
> >  }
> >
> > @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re
> >
> > +  /* look for intra-revision matches (usually data reps but not limited
> > +     to them in case props happen to look like some data rep)
> > +   */
> > +  if (*old_rep == NULL && rep->txn_id)
> > +    {
> > +      svn_node_kind_t kind;
> > +      const char *file_name
> > +        = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool),
> > +                          svn_checksum_to_cstring(rep->sha1_checksum,\
> > +                                                  pool), pool);
> > +
> > +      /* in our txn, is there a rep file named with the wanted SHA1?
> > +         If so, read it and use that rep. */
> > +      SVN_ERR(svn_io_check_path(file_name, &kind, pool));
>
> Instead
>  of stat'ing the file to decide whether to open it, it's more efficient
> and more 'robust' to just try to open it and then handle the potential
> failure, isn't it?
>

Access is serialized (see below). Checking for existence
should be more efficient because actual matches are
relatively rare and constructing an error object can be
expensive (NLS). svn_stringbuf_from_file2 is also a very
convenient function to use.

> +      if (kind == svn_node_file)
> > +        {
> > +          svn_stringbuf_t *rep_string;
> > +          SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,\
> > +                                           pool));
>
> I'm
>  sure the answer is obvious to those familiar with FSFS, but just to be
> clear please could you tell me which locking
> mechanism guarantees that the opening and reading of this file cannot be
>  happening at the same time as this file is being created and written
> (by another thread or process)?
>

Writing representations is serialized using the protorev lock.
All access to the sha1->rep mapping is done from
rep_write_contents_close() now, which in turn, is protected
by that lock.


> Is it possible that when this
> file was being written, the writer crashed in such a way that the file
> is now present but incomplete, and do we want to be able to recover --
> that is, be able to re-open this txn and resume writing into it -- after
>  such a failure?  If so, we should catch errors here that result from
> reading incomplete data, and in such cases proceed as if the file was
> not found; or else write the file atomically so that that can't happen.
>

A crashed writer process may leave a corrupt protorev and / or
other incomplete files. There is no atomic incremental change
here. The caller (client) using the crashed process is supposed
to detect the crash and abandon the transaction.


> > +          SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data,
> > +                                        rep->txn_id, FALSE, pool));
> > +        }
> > +    }
> > +
> >    /* Add information that is missing in the cached data. */
> >    if (*old_rep)
> >      {
>
> - Julian
>
>
Thanks for the review! The code should be multi-thread safe.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Reply via email to