Julian Foad wrote on Fri, Jun 25, 2021 at 14:27:30 +0100: > Now filed as: https://issues.apache.org/jira/browse/SVN-4877 >
"Now"? The issue is a month old. > THE UNWANTED BEHAVIOUR > > The code we are concerned with is a section of the FSFS commit finalization > code, in commit_body() called from svn_fs_fs__commit(), in > subversion/libsvn_fs_fs/transaction.c, around line 3800. > > Within the following code section: > > 1. Get a write handle on the proto revision file. > 2. Complete the construction of the proto-rev file. > 3. Move the finished rev file into place. > 4. Unlock the proto-rev file. > 5. Continue with the rest of the commit finalization. > > If an error is thrown during steps 2 and 3, and the calling process re-tries > the commit, the second try fails with: > > 160044 - Cannot write to the prototype revision file of transaction '1-2' > because a previous representation is currently being written by this process > at subversion/libsvn_fs_fs/transaction.c:312 > > What is happening: > > - when the initial error is encountered, FSFS returns the error; > - FSFS does not call unlock_proto_rev()... > - ... so the proto-rev lock file remains on disk, with a file-lock held on > it and the "being_written" flag remaining set in the transaction object in > memory; > - on re-try, when the code tries to acquire a proto-rev write lock, it > finds the lock is already held, and throws SVN_ERR_FS_REP_BEING_WRITTEN. > Okay, so it fails with a wrong error message. > Subversion itself doesn't care. It never re-tries at this level. It always > discards the transaction if an error is returned. > Even under mod_dav_svn when the client is a generic DAV client? > DESIRED BEHAVIOUR > > The commit re-try should fail in exactly the same way as the first try, if > the underlying cause of the error persists. > > Or, if the underlying cause of the error was transient and has gone away, > then the commit should succeed. > I think it's possible that the second try would legitimately fail with a different error message than the first try. Thus, it's not clear what the desired _change_ in behaviour is. > PROPOSED FIX > > My proposed fix is to unlock the lock when exiting that code section, no > matter whether successful or throwing an error. > > Pseudo-Python: > > try: > 1. Get a write handle on the proto revision file. > 2. Complete the construction of the proto-rev file. > 3. Move the finished rev file into place. > finally: > 4. Unlock the proto-rev file. > 5. Continue with the rest of the commit finalization. > Why would this be correct? Why would this have no downside?