On Fri, Sep 2, 2011 at 12:35, Philip Martin <philip.mar...@wandisco.com> wrote: > Philip Martin <philip.mar...@wandisco.com> writes: > >> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7 >> >> SELECT local_relpath, op_depth, presence, kind >> FROM nodes >> WHERE wc_id = ?1 AND parent_relpath = ?2 >> GROUP BY local_relpath >> ORDER BY op_depth DESC >> >> performs poorly and doesn't scale well with working copy size. > > After some more discussion in IRC it seems that the above SQL is > dubious: the selected columns may not all come from the same row, and we > may not get the highest op_depth. Using NODES_CURRENT would fix the > problem as does my C solution, and the combination of the two is now on > trunk. So I'll be proposing r1164426 and r1164614 to fix the 1.7 > performance problem without any schema changes.
There are some differences in wc_db.c after your two changes: $ svn diff -r HEAD wc_db.c -x -p Index: wc_db.c =================================================================== --- wc_db.c (revision 1164712) +++ wc_db.c (working copy) @@ -7005,8 +7005,7 @@ struct read_children_info_baton_t apr_pool_t *result_pool; }; -/* What we really want to store about a node. This relies on the - offset of svn_wc__db_info_t being zero. */ +/* What we really want to store about a node */ struct read_children_info_item_t { struct svn_wc__db_info_t info; @@ -7443,6 +7442,7 @@ svn_wc__db_read_children_walker_info(apr_hash_t ** const char *dir_relpath; svn_sqlite__stmt_t *stmt; svn_boolean_t have_row; + apr_int64_t op_depth; SVN_ERR_ASSERT(svn_dirent_is_absolute(dir_abspath)); @@ -7462,10 +7462,13 @@ svn_wc__db_read_children_walker_info(apr_hash_t ** struct svn_wc__db_walker_info_t *child; const char *child_relpath = svn_sqlite__column_text(stmt, 0, NULL); const char *name = svn_relpath_basename(child_relpath, NULL); - apr_int64_t op_depth = svn_sqlite__column_int(stmt, 1); svn_error_t *err; - child = apr_palloc(result_pool, sizeof(*child)); + child = apr_hash_get(*nodes, name, APR_HASH_KEY_STRING); + if (child == NULL) + child = apr_palloc(result_pool, sizeof(*child)); + + op_depth = svn_sqlite__column_int(stmt, 1); child->status = svn_sqlite__column_token(stmt, 2, presence_map); if (op_depth > 0) { Since we are selecting only "current nodes" for each local_relpath, then I presume we don't need the apr_hash_get() in there, correct? (and I already know that we can eliminate the apparent diffs around op_depth with no harm) I'd really like to put the code back to pre-r1164426 if at all possible (makes it clearer what we're voting for in STATUS). Cheers, -g