When writing an FSFS revision file some parts are written in hash order and so with a recent APR the order is not predictable. Thus loading the same dumpfile into two separate empty repositories produces different revision files. The things that vary are the order of the nodes in the file and the order of the change lines at the end of the file.
As with the dumpfile format we have never formally specified the order of a revision file so the randomn order is not strictly a bug but it might be useful if the order was at least repeatable. Things get more complicated in 1.8 as a side-effect of directory deltification. Directory deltification works best if the directory order is stable and so some hashes now use a non-APR hash function to produce a stable order. Whether or not the revision file is repeatable depends on which hashes are used. The change lines hash is still unstable but the hash returned by svn_fs_fs__rep_contents_dir can be stable or unstable depending on whether or not the has was found in the cache or created on demand. It makes me even more uneasy that how much variation is present in the revision file depends on our caching strategy. I'm considering changing the commit code so that hashes are written in a stable order and the revision files are repeatable. Does anyone think this would be a bad idea? Index: subversion/libsvn_fs_fs/fs_fs.c =================================================================== --- subversion/libsvn_fs_fs/fs_fs.c (revision 1367025) +++ subversion/libsvn_fs_fs/fs_fs.c (working copy) @@ -7367,16 +7367,19 @@ write_final_rev(const svn_fs_id_t **new_id_p, { apr_pool_t *subpool; apr_hash_t *entries, *str_entries; - apr_hash_index_t *hi; + apr_array_header_t *sorted_entries; + int i; /* This is a directory. Write out all the children first. */ subpool = svn_pool_create(pool); SVN_ERR(svn_fs_fs__rep_contents_dir(&entries, fs, noderev, pool)); - - for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi)) + sorted_entries = svn_sort__hash(entries, svn_sort_compare_items_lexically, + pool); + for (i = 0; i < sorted_entries->nelts; ++i) { - svn_fs_dirent_t *dirent = svn__apr_hash_index_val(hi); + svn_fs_dirent_t *dirent = APR_ARRAY_IDX(sorted_entries, i, + svn_sort__item_t).value; svn_pool_clear(subpool); SVN_ERR(write_final_rev(&new_id, file, rev, fs, dirent->id, @@ -7546,19 +7549,22 @@ write_final_changed_path_info(apr_off_t *offset_p, { apr_hash_t *changed_paths; apr_off_t offset; - apr_hash_index_t *hi; apr_pool_t *iterpool = svn_pool_create(pool); fs_fs_data_t *ffd = fs->fsap_data; svn_boolean_t include_node_kinds = ffd->format >= SVN_FS_FS__MIN_KIND_IN_CHANGED_FORMAT; + apr_array_header_t *sorted_changed_paths; + int i; SVN_ERR(get_file_offset(&offset, file, pool)); SVN_ERR(svn_fs_fs__txn_changes_fetch(&changed_paths, fs, txn_id, pool)); + sorted_changed_paths = svn_sort__hash(changed_paths, + svn_sort_compare_items_lexically, pool); /* Iterate through the changed paths one at a time, and convert the temporary node-id into a permanent one for each change entry. */ - for (hi = apr_hash_first(pool, changed_paths); hi; hi = apr_hash_next(hi)) + for (i = 0; i < sorted_changed_paths->nelts; ++i) { node_revision_t *noderev; const svn_fs_id_t *id; @@ -7567,8 +7573,8 @@ write_final_changed_path_info(apr_off_t *offset_p, svn_pool_clear(iterpool); - change = svn__apr_hash_index_val(hi); - path = svn__apr_hash_index_key(hi); + change = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).value; + path = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).key; id = change->node_rev_id; -- Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download