Patrick Georgi <[email protected]> writes: > Am 25.07.2009 04:07, schrieb Stephen Leake: >> Patrick Georgi<[email protected]> writes: >> >>> The current code takes two rosters, compares them, and registers the >>> differences. The problem is that this approach scales by roster size >>> (which is directly dependent on the tree size). The problem is, that I'm >>> working with trees with 50000 files - comparing all of their entries >>> just to find the 5 that actually changed is quite a waste of time. >>> As a test, I disabled the roster deltification code, and got quite some >>> speedup (pushing 10 revisions with put_revision in approx. half the >>> time), so there's definitely room for improvement. >>> >> I think the main motivation for roster_deltas is database size. If we >> can store a new revision as an old revision plus a small delta, that's >> a significant size saving. As with anything that saves size, it costs >> computation time. Which doesn't mean the current trade-off is optimal. >> Disk space is certainly cheaper than it used to be. But so is CPU time >> :). >> Perhaps a user tuning option could be provided? >> > The problem in this case, that large trees affect both of them very > negatively. When I did the profiling, I disabled roster_delta > generation (#if 0/#endif around a couple of lines in database.cc) to > see if I'm really on the right track. > But this "optimization", while reasonable for small trees, really > hurts large trees, too. Every roster contains information about 50000 > files, and that sums up.
Which suggests a possible space savings; roster_delta could compute the change in the manifest as well, so each revision doesn't need to store the full manifest. Hmm. I don't think the manifests are actually stored anywhere. Apparently they used to be, but not anymore. So it looks like that optimization has already been done. > Yes, but look how it's done. It's in roster_delta.cc, > make_roster_delta_t(...): > It creates a datastructure containing information about _all_ files in > the roster (ie. all 50000 in my case), and then loops over them and > notes the differences. Ok; that's 50000 iterations. I just wouldn't call them "lookups", since they don't fetch stuff from the database unless there is a change. > The cset already contains a list of all changes between the two > revisions, so the code could walk a changeset to figure out which > roster entries to handle. As most changesets are rather small, that > should give a nice speed up. Yes, that makes sense. -- -- Stephe _______________________________________________ Monotone-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monotone-devel
