On Thu, Nov 15, 2012 at 6:16 PM, Julian Foad <julianf...@btopenworld.com>wrote:
> Hi Stefan. Some review questions and comments... > > stef...@apache.org wrote: > > URL: http://svn.apache.org/viewvc?rev=1397773&view=rev > > Log: > > Due to public request: apply rep-sharing to equal data reps within > > the same transaction. > > > > The idea is simple. When writing a noderev to the txn folder, > > write another file named by the rep's SHA1 and store the rep > > struct in there. Lookup is then straight-forward. > > Please > could you update the FSFS structure document. I assume it says what > files are stored in the txn dir; if it doesn't it should. > Oops - hadn't thought about that! Done in r1411202. All other changes in r1411209. > * subversion/libsvn_fs_fs/fs_fs.c > > (svn_fs_fs__put_node_revision): also look for SHA1-named files > > (get_shared_rep): write SHA1-named files > [...] > > @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f > > > > - return svn_io_file_close(noderev_file, pool); > > + SVN_ERR(svn_io_file_close(noderev_file, pool)); > > + > > + /* if rep sharing has been enabled and the noderev has a data rep > > + * and its SHA-1 is known, store the rep struct under its SHA1. */ > > It > looks like we re-enter this 'if' block, and re-write the rep-reference > file, every time we store a node-rev with the same representation -- > even if the rep we're storing was already found from this file. This > might not be much of an overhead, but it seems ugly. > > Could we > avoid re-writing it in those cases? Maybe by refactoring such that we > write the rep-reference file immediately after writing the rep, instead > of here in put_node_revision. > Yep. It turns out that rep_write_contents_close is the only relevant user of that functionality. There, we can also decide whether to store the sha1->rep mapping or not. It also addresses the locking problem below. > > + if (sha1) > > + { > > + apr_file_t *rep_file; > > + const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id,\ > > + pool), sha1, pool); > > Creating > the file name ('checksum_to_cstring' + 'path_txn_dir' + > 'svn_dirent_join') is common to both this function and the function > below where we read the file. Factor it out, like the existing > path_txn_node_rev() and similar functions, to make the commonality > clear. > Done. > > + const char *rep_string = representation_string(noderev->data_rep, > > + ffd->format, > > + (noderev->kind > > + == svn_node_dir), > > + FALSE, > > + pool); > > + SVN_ERR(svn_io_file_open(&rep_file, file_name, > > + APR_WRITE | APR_CREATE | APR_TRUNCATE > > + | APR_BUFFERED, APR_OS_DEFAULT, pool)); > > + > > + SVN_ERR(svn_io_file_write_full(rep_file, rep_string, > > + strlen(rep_string), NULL, pool)); > > + > > + SVN_ERR(svn_io_file_close(rep_file, pool)); > > + } > > + > > + return SVN_NO_ERROR; > > } > > > > @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re > > > > + /* look for intra-revision matches (usually data reps but not limited > > + to them in case props happen to look like some data rep) > > + */ > > + if (*old_rep == NULL && rep->txn_id) > > + { > > + svn_node_kind_t kind; > > + const char *file_name > > + = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool), > > + svn_checksum_to_cstring(rep->sha1_checksum,\ > > + pool), pool); > > + > > + /* in our txn, is there a rep file named with the wanted SHA1? > > + If so, read it and use that rep. */ > > + SVN_ERR(svn_io_check_path(file_name, &kind, pool)); > > Instead > of stat'ing the file to decide whether to open it, it's more efficient > and more 'robust' to just try to open it and then handle the potential > failure, isn't it? > Access is serialized (see below). Checking for existence should be more efficient because actual matches are relatively rare and constructing an error object can be expensive (NLS). svn_stringbuf_from_file2 is also a very convenient function to use. > + if (kind == svn_node_file) > > + { > > + svn_stringbuf_t *rep_string; > > + SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name,\ > > + pool)); > > I'm > sure the answer is obvious to those familiar with FSFS, but just to be > clear please could you tell me which locking > mechanism guarantees that the opening and reading of this file cannot be > happening at the same time as this file is being created and written > (by another thread or process)? > Writing representations is serialized using the protorev lock. All access to the sha1->rep mapping is done from rep_write_contents_close() now, which in turn, is protected by that lock. > Is it possible that when this > file was being written, the writer crashed in such a way that the file > is now present but incomplete, and do we want to be able to recover -- > that is, be able to re-open this txn and resume writing into it -- after > such a failure? If so, we should catch errors here that result from > reading incomplete data, and in such cases proceed as if the file was > not found; or else write the file atomically so that that can't happen. > A crashed writer process may leave a corrupt protorev and / or other incomplete files. There is no atomic incremental change here. The caller (client) using the crashed process is supposed to detect the crash and abandon the transaction. > > + SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data, > > + rep->txn_id, FALSE, pool)); > > + } > > + } > > + > > /* Add information that is missing in the cached data. */ > > if (*old_rep) > > { > > - Julian > > Thanks for the review! The code should be multi-thread safe. -- Stefan^2. -- Certified & Supported Apache Subversion Downloads: * http://www.wandisco.com/subversion/download *