Hi Stefan, I recently stumbled upon this changeset and I had some time to review it. I think that the things should not be done this way. As I see it, this change breaks one fundamental aspect of the non-incremental hotcopy, adds an ugly hack to the fs_hotcopy() layer and can lead to a potential deadlock between 'hotcopy' and 'pack' operations. See my comments inline.
> - hotcopy_setup_shared_fs_data(src_fs, dst_fs); > - SVN_ERR(svn_fs_fs__initialize_caches(dst_fs, pool)); > + /* FS creation is complete. Stamp it with a format file. */ > + SVN_ERR(svn_fs_fs__write_format(dst_fs, TRUE, pool)); > > return SVN_NO_ERROR; > } The whole idea of the non-incremental hotcopy is that it *does not* make the destination visible until the hotcopy finishes. This is how the things worked prior to this changeset and this is how the svn_repos layer currently works. By writing the format file before actually doing something we do not only expose our internal stub to the world (which really is a stub without even r0), but will also leave this internal stub openable in case the hotcopy fails. This stub does not even have to be a normal filesystem and we only need it to bootstrap the hotcopy process, but we now behave as if it was! Finally, the 'repos' and 'fs' layers now behave inconsistently in these terms, i.e. svn_repos_hotcopy3() still writes the format file only when everything is done. > + SVN_ERR(svn_fs_fs__hotcopy_prepare_target(src_fs, dst_fs, dst_path, > + incremental, pool)); > + dst_fs->fsap_data = NULL; > + > + /* Now, the destination repo should open just fine. */ > + SVN_ERR(fs_open(dst_fs, dst_path, common_pool_lock, pool, common_pool)); This is the hack I was talking about. Clearing FSAP_DATA here is only required to pass the precondition check performed by svn_fs__check_fs() within the fs_open() call. By the way, how should an arbitrary caller use the introduced svn_fs_fs__hotcopy_prepare_target() function? Should he/she *always* clear that field in order to be able to open the filesystem? > +/* Wrapper around hotcopy_body taking out all necessary source repositories. > + */ > +static svn_error_t * > +hotcopy_locking_src_body(void *baton, apr_pool_t *pool) > { > - fs_fs_data_t *src_ffd = src_fs->fsap_data; > - fs_fs_data_t *dst_ffd = dst_fs->fsap_data; > + struct hotcopy_body_baton *hbb = baton; > + fs_fs_data_t *src_ffd = hbb->src_fs->fsap_data; > > - /* The common pool and mutexes are shared between src and dst filesystems. > - * During hotcopy we only grab the mutexes for the destination, so there > - * is no risk of dead-lock. We don't write to the src filesystem. Shared > - * data for the src_fs has already been initialised in fs_hotcopy(). */ > - dst_ffd->shared = src_ffd->shared; > + return src_ffd->format >= SVN_FS_FS__MIN_PACK_LOCK_FORMAT > + ? svn_error_trace(svn_fs_fs__with_pack_lock(hbb->src_fs, hotcopy_body, > + baton, pool)) > + : hotcopy_body(baton, pool); > } Here is the potential deadlock. Say, someone runs 'svnadmin freeze svnadmin hotcopy' and at the same moment 'svnadmin pack' is executed for the repository. Packing takes the pack-lock for the whole operation, but sometimes also needs to take a nested write-lock within the pack_shard() function. This is what could happen: 1. freeze (takes the db/write-lock, inner 'hotcopy' still pending to execute) 2. pack (takes the db/pack-lock, blocks trying to acquire the db/write-lock) 3. hotcopy (blocks trying to acquire the db/pack-lock, **deadlock!**) Personally, I also find this hunk cryptic due to the usage of the non-obvious ternary operator and due to another added layer of the indirection (now svn_fs_fs__hotcopy() calls hotcopy_locking_src_body() calling hotcopy_body()). The comment is also misleading. What exactly are the "source repositories" in this context? Could you please fix these problems or entirely revert this change? Regards, Evgeny Kotkov