Re: [Monotone-devel] Roster handling in the code

Stephen Leake Sat, 25 Jul 2009 02:56:46 -0700

Patrick Georgi <[email protected]> writes:

> Am 25.07.2009 04:07, schrieb Stephen Leake:
>> Patrick Georgi<[email protected]>  writes:
>>    
>>> The current code takes two rosters, compares them, and registers the
>>> differences. The problem is that this approach scales by roster size
>>> (which is directly dependent on the tree size). The problem is, that I'm
>>> working with trees with 50000 files - comparing all of their entries
>>> just to find the 5 that actually changed is quite a waste of time.
>>> As a test, I disabled the roster deltification code, and got quite some
>>> speedup (pushing 10 revisions with put_revision in approx. half the
>>> time), so there's definitely room for improvement.
>>>      
>> I think the main motivation for roster_deltas is database size. If we
>> can store a new revision as an old revision plus a small delta, that's
>> a significant size saving. As with anything that saves size, it costs
>> computation time. Which doesn't mean the current trade-off is optimal.
>> Disk space is certainly cheaper than it used to be. But so is CPU time
>> :).
>> Perhaps a user tuning option could be provided?
>>    
> The problem in this case, that large trees affect both of them very
> negatively. When I did the profiling, I disabled roster_delta
> generation (#if 0/#endif around a couple of lines in database.cc) to
> see if I'm really on the right track.
> But this "optimization", while reasonable for small trees, really
> hurts large trees, too. Every roster contains information about 50000
> files, and that sums up.


Which suggests a possible space savings; roster_delta could compute the
change in the manifest as well, so each revision doesn't need to store
the full manifest.

Hmm. I don't think the manifests are actually stored anywhere.
Apparently they used to be, but not anymore. So it looks like that
optimization has already been done.

> Yes, but look how it's done. It's in roster_delta.cc,
> make_roster_delta_t(...):
> It creates a datastructure containing information about _all_ files in
> the roster (ie. all 50000 in my case), and then loops over them and
> notes the differences.

Ok; that's 50000 iterations. I just wouldn't call them "lookups",
since they don't fetch stuff from the database unless there is a change.

> The cset already contains a list of all changes between the two
> revisions, so the code could walk a changeset to figure out which
> roster entries to handle. As most changesets are rather small, that
> should give a nice speed up.

Yes, that makes sense.

-- 
-- Stephe


_______________________________________________
Monotone-devel mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/monotone-devel

Re: [Monotone-devel] Roster handling in the code

Reply via email to