Hi Stefan.  Some review questions and comments...

stef...@apache.org wrote:
> URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
> Log:
> Due to public request: apply rep-sharing to equal data reps within
> the same transaction. 
> 
> The idea is simple. When writing a noderev to the txn folder,
> write another file named by the rep's SHA1 and store the rep
> struct in there. Lookup is then straight-forward.

Please
 could you update the FSFS structure document.  I assume it says what 
files are stored in the txn dir; if it doesn't it should.

> * subversion/libsvn_fs_fs/fs_fs.c
>   (svn_fs_fs__put_node_revision): also look for SHA1-named files
>   (get_shared_rep): write SHA1-named files
[...]
> @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f
> 
> -  return svn_io_file_close(noderev_file, pool);
> +  SVN_ERR(svn_io_file_close(noderev_file, pool));
> +
> +  /* if rep sharing has been enabled and the noderev has a data rep
> +   * and its SHA-1 is known, store the rep struct under its SHA1. */

It
 looks like we re-enter this 'if' block, and re-write the rep-reference 
file, every time we store a node-rev with the same representation -- 
even if the rep we're storing was already found from this file.  This 
might not be much of an overhead, but it seems ugly.

Could we 
avoid re-writing it in those cases?  Maybe by refactoring such that we 
write the rep-reference file immediately after writing the rep, instead 
of here in put_node_revision.

> +  if (sha1)
> +   {
> +    apr_file_t *rep_file;
> +    const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id,\
> +                                            pool), sha1, pool);

Creating
 the file name ('checksum_to_cstring' + 'path_txn_dir' + 
'svn_dirent_join') is common to both this function and the function 
below where we read the file.  Factor it out, like the existing 
path_txn_node_rev() and similar functions, to make the commonality 
clear.

> +    const char *rep_string = representation_string(noderev->data_rep,
> +                                                   ffd->format,
> +                                                   (noderev->kind
> +                                                    == svn_node_dir),
> +                                                   FALSE,
> +                                                   pool);
> +    SVN_ERR(svn_io_file_open(&rep_file, file_name,
> +                             APR_WRITE | APR_CREATE | APR_TRUNCATE
> +                             | APR_BUFFERED, APR_OS_DEFAULT, pool));
> +
> +    SVN_ERR(svn_io_file_write_full(rep_file, rep_string,
> +                                   strlen(rep_string), NULL, pool));
> +
> +    SVN_ERR(svn_io_file_close(rep_file, pool));
> +   }
> +
> +  return SVN_NO_ERROR;
>  }
> 
> @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re
> 
> +  /* look for intra-revision matches (usually data reps but not limited
> +     to them in case props happen to look like some data rep)
> +   */
> +  if (*old_rep == NULL && rep->txn_id)
> +    {
> +      svn_node_kind_t kind;
> +      const char *file_name
> +        = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool),
> +                          svn_checksum_to_cstring(rep->sha1_checksum,\
> +                                                  pool), pool);
> +
> +      /* in our txn, is there a rep file named with the wanted SHA1?
> +         If so, read it and use that rep. */
> +      SVN_ERR(svn_io_check_path(file_name, &kind, pool));

Instead
 of stat'ing the file to decide whether to open it, it's more efficient 
and more 'robust' to just try to open it and then handle the potential 
failure, isn't it?

> +      if (kind == svn_node_file)
> +        {
> +          svn_stringbuf_t *rep_string;
> +          SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,\
> +                                           pool));

I'm
 sure the answer is obvious to those familiar with FSFS, but just to be clear 
please could you tell me which locking 
mechanism guarantees that the opening and reading of this file cannot be
 happening at the same time as this file is being created and written 
(by another thread or process)?

Is it possible that when this 
file was being written, the writer crashed in such a way that the file 
is now present but incomplete, and do we want to be able to recover -- 
that is, be able to re-open this txn and resume writing into it -- after
 such a failure?  If so, we should catch errors here that result from 
reading incomplete data, and in such cases proceed as if the file was 
not found; or else write the file atomically so that that can't happen.

> +          SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data,
> +                                        rep->txn_id, FALSE, pool));
> +        }
> +    }
> +
>    /* Add information that is missing in the cached data. */
>    if (*old_rep)
>      {

- Julian

Reply via email to