Hi Stefan. Some review questions and comments... stef...@apache.org wrote: > URL: http://svn.apache.org/viewvc?rev=1397773&view=rev > Log: > Due to public request: apply rep-sharing to equal data reps within > the same transaction. > > The idea is simple. When writing a noderev to the txn folder, > write another file named by the rep's SHA1 and store the rep > struct in there. Lookup is then straight-forward.
Please could you update the FSFS structure document. I assume it says what files are stored in the txn dir; if it doesn't it should. > * subversion/libsvn_fs_fs/fs_fs.c > (svn_fs_fs__put_node_revision): also look for SHA1-named files > (get_shared_rep): write SHA1-named files [...] > @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f > > - return svn_io_file_close(noderev_file, pool); > + SVN_ERR(svn_io_file_close(noderev_file, pool)); > + > + /* if rep sharing has been enabled and the noderev has a data rep > + * and its SHA-1 is known, store the rep struct under its SHA1. */ It looks like we re-enter this 'if' block, and re-write the rep-reference file, every time we store a node-rev with the same representation -- even if the rep we're storing was already found from this file. This might not be much of an overhead, but it seems ugly. Could we avoid re-writing it in those cases? Maybe by refactoring such that we write the rep-reference file immediately after writing the rep, instead of here in put_node_revision. > + if (sha1) > + { > + apr_file_t *rep_file; > + const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id,\ > + pool), sha1, pool); Creating the file name ('checksum_to_cstring' + 'path_txn_dir' + 'svn_dirent_join') is common to both this function and the function below where we read the file. Factor it out, like the existing path_txn_node_rev() and similar functions, to make the commonality clear. > + const char *rep_string = representation_string(noderev->data_rep, > + ffd->format, > + (noderev->kind > + == svn_node_dir), > + FALSE, > + pool); > + SVN_ERR(svn_io_file_open(&rep_file, file_name, > + APR_WRITE | APR_CREATE | APR_TRUNCATE > + | APR_BUFFERED, APR_OS_DEFAULT, pool)); > + > + SVN_ERR(svn_io_file_write_full(rep_file, rep_string, > + strlen(rep_string), NULL, pool)); > + > + SVN_ERR(svn_io_file_close(rep_file, pool)); > + } > + > + return SVN_NO_ERROR; > } > > @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re > > + /* look for intra-revision matches (usually data reps but not limited > + to them in case props happen to look like some data rep) > + */ > + if (*old_rep == NULL && rep->txn_id) > + { > + svn_node_kind_t kind; > + const char *file_name > + = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool), > + svn_checksum_to_cstring(rep->sha1_checksum,\ > + pool), pool); > + > + /* in our txn, is there a rep file named with the wanted SHA1? > + If so, read it and use that rep. */ > + SVN_ERR(svn_io_check_path(file_name, &kind, pool)); Instead of stat'ing the file to decide whether to open it, it's more efficient and more 'robust' to just try to open it and then handle the potential failure, isn't it? > + if (kind == svn_node_file) > + { > + svn_stringbuf_t *rep_string; > + SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,\ > + pool)); I'm sure the answer is obvious to those familiar with FSFS, but just to be clear please could you tell me which locking mechanism guarantees that the opening and reading of this file cannot be happening at the same time as this file is being created and written (by another thread or process)? Is it possible that when this file was being written, the writer crashed in such a way that the file is now present but incomplete, and do we want to be able to recover -- that is, be able to re-open this txn and resume writing into it -- after such a failure? If so, we should catch errors here that result from reading incomplete data, and in such cases proceed as if the file was not found; or else write the file atomically so that that can't happen. > + SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data, > + rep->txn_id, FALSE, pool)); > + } > + } > + > /* Add information that is missing in the cached data. */ > if (*old_rep) > { - Julian