On Fri, Jan 28, 2011 at 5:51 AM, Hyrum K Wright <hy...@hyrumwright.org> wrote: > On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <jcor...@gmail.com> wrote: >> On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hy...@hyrumwright.org> >> wrote: >>> Johan (and other interested parties), >>> I've been following some of the commits to the >>> diff-optimizations-branch with interest. While I've not reviewed them >>> for technical merit, it appears that others have, and that there is >>> good work going on in the branch. >> >> Thanks for your interest. >> >> It might be good to get a little bit more review on the whole, I >> think. A lot of people have read parts of the code, but if I remember >> correctly most (if not all) of them have said things like "I haven't >> reviewed it in detail, but here are some suggestions, feedback, ...". >> >>> I'm wondering what the potential >>> for merging back to trunk is. This is the TODO section from the >>> BRANCH-README: >>> >>> TODO: >>> >>> * eliminate identical prefix [DONE] >>> * eliminate identical suffix [DONE] >>> * make diff3 and diff4 use datasources_open [DONE] >>> (this may eliminate the need for datasource_open, and the flag >>> datasource_opened in token.c#svn_diff__get_tokens) >>> * implement (part of) increment_pointers and >>> decrement_pointers with a macro [DONE] >>> (offers another 10% speedup) >>> * integrate (some of the) low-level optimizations for prefix/suffix >>> scanning, proposed by stefan2 [2] [] >>> * revv svn_diff.h#svn_diff_fns_t [] >>> >>> It looks like, for the most part, any destabilizing functionality is >>> completed, and what remains are simply optimizations. This >>> optimization work can probably be performed on trunk, and if so, we >>> should merge the branch back and do the cleanup work there. The only >>> bit on the current list of stuff is revving svn_diff_fns_t. Can that >>> work be parallelized? >> >> Yes, you are correct. Most of the essential work is done. Further >> optimizations can just as well be done on trunk. >> >> Revving svn_diff_fns_t: what do you mean with parallelizing it? I must >> admit that I don't really know (yet) how to go about that. Very early >> during the branch work, danielsh pointed out that I modified this >> public struct (vtable for reading data from datasources), so it should >> be revved. Is it listed/documented somewhere what should be done for >> that (community guide perhaps)? >> >> (one slightly related note to this: I introduced the function >> datasources_open, to open the multiple datasources at once (as opposed >> to the original datasource_open, which only opens one datasource). But >> maybe the new function should be renamed to datasource_open_all or >> datasources_open_all or something, to make it stand out a little bit >> more). >> >>> I'm not trying to rush the work, nor do I think it is essential for >>> 1.7, but it feels like a pretty good performance increase, and one >>> that we shouldn't have any problem shipping. (If there are currently >>> API uncertainties, than it would be good to wait until 1.7.x branches, >>> though.) >> >> Yes, I think it's quite ready to be merged, and could provide a very >> significant performance increase (for diff, merge and blame). >> >> The API is stable now, I think, except maybe for the name of the >> datasources_open function (see above). If we decide to go (for >> optimizations reasons) for specialized prefix/suffix scanning >> functions for 2, 3 or 4 datasources, I think it's best to keep the >> generic datasources_open API (with an array of datasources), and only >> split up between specialized versions in the implementation. > > The API is now rev'd, and I caught the branch up with trunk in > r1064459. So it looks like we're ready to merge! > > Johan, would you like to do the honors?
Thanks. I'm not sure: shouldn't we wait for a little bit more review, or can that happen after integration on trunk (or reviewing the reintegration commit itself or something)? E.g. stefan2 said he was going to take a look during his travel time next week. And one more thing came to mind with regards to the new API (datasources_open function): currently it only supports up to 4 datasources, so not an arbitrary number of datasources (the implementation in diff_file.c assumes that anyway, because it has to use some local array variables, so needs a fixed array length). I guess that's ok, because existing code in diff_file.c also already assumes that (e.g. the svn_diff__file_baton_t struct, containing an array of 4 file_info structs). And of course, there are currently no know usages of that API with more than 4 datasources (diff, diff3 and diff4). Should this restriction be documented in the docstring of datasources_open withing svn_diff.h#svn_diff_fns2_t or something? Cheers, -- Johan