Julian Foad wrote on Wed, Dec 01, 2010 at 13:06:04 +0000: > Daniel Shahaf wrote: > > So we loop over the remaining sha1's and remove each of them... > > I wonder if there is room for further optimization here? e.g., does > > this prepare/reset the statement just once, or once per iteration? > > Each iteration of this loop prepares, uses and resets a SQL statement, > and also removes a pristine file from disk. So yes there is room for > further optimization of the SQL part of that. >
Thanks for the overview. > The main concern I was addressing was that the previous method was > *quadratic* in the total number of pristines in the store, because for > each one in the store it would scan the NODES and ACTUAL_NODE tables > looking for a reference to it. I had noticed that even a no-op cleanup > took a very long time on a large WC. It will help if I show some real > timings. > > Wall clock times for "svn cleanup" on a clean checkout of > ^/subversion/branc...@1040943 on my Linux system. > > r1040662 build: first time = 15 minutes, second = 14.8 minutes. > > r1040663 build: first time = 4.4s, best of many repetitions = 0.7s. > :-)! > Now the algorithm is only linear time, which is a *huge* win. A > 'cleanup' operation doesn't need to be blisteringly fast, so I don't > think it needs more optimisation. > > I've edited the log message to clarify the main point, and to point out > the big-WC timing improvement. > Fair enough; let's table this additional optimization for now. We can always add it later if needed (i.e., if the time spent removing unreferenced pristines should become an issue). > - Julian > > > # r1040662 build > $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ > real 15m4.962s > user 9m0.306s > sys 6m3.967s > > # r1040663 build > $ time ~/build/subversion-c/subversion/svn/svn cleanup branches/ > real 0m0.708s > user 0m0.436s > sys 0m0.212s > > > Thanks, Daniel