On Fri, Jan 28, 2011 at 6:49 AM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Fri, Jan 28, 2011 at 5:51 AM, Hyrum K Wright <hy...@hyrumwright.org> wrote: >> On Tue, Jan 25, 2011 at 8:31 PM, Johan Corveleyn <jcor...@gmail.com> wrote: >>> On Tue, Jan 25, 2011 at 4:58 PM, Hyrum K Wright <hy...@hyrumwright.org> >>> wrote: >>>> Johan (and other interested parties), >>>> I've been following some of the commits to the >>>> diff-optimizations-branch with interest. While I've not reviewed them >>>> for technical merit, it appears that others have, and that there is >>>> good work going on in the branch. >>> >>> Thanks for your interest. >>> >>> It might be good to get a little bit more review on the whole, I >>> think. A lot of people have read parts of the code, but if I remember >>> correctly most (if not all) of them have said things like "I haven't >>> reviewed it in detail, but here are some suggestions, feedback, ...". >>> >>>> I'm wondering what the potential >>>> for merging back to trunk is. This is the TODO section from the >>>> BRANCH-README: >>>> >>>> TODO: >>>> >>>> * eliminate identical prefix [DONE] >>>> * eliminate identical suffix [DONE] >>>> * make diff3 and diff4 use datasources_open [DONE] >>>> (this may eliminate the need for datasource_open, and the flag >>>> datasource_opened in token.c#svn_diff__get_tokens) >>>> * implement (part of) increment_pointers and >>>> decrement_pointers with a macro [DONE] >>>> (offers another 10% speedup) >>>> * integrate (some of the) low-level optimizations for prefix/suffix >>>> scanning, proposed by stefan2 [2] [] >>>> * revv svn_diff.h#svn_diff_fns_t [] >>>> >>>> It looks like, for the most part, any destabilizing functionality is >>>> completed, and what remains are simply optimizations. This >>>> optimization work can probably be performed on trunk, and if so, we >>>> should merge the branch back and do the cleanup work there. The only >>>> bit on the current list of stuff is revving svn_diff_fns_t. Can that >>>> work be parallelized? >>> >>> Yes, you are correct. Most of the essential work is done. Further >>> optimizations can just as well be done on trunk. >>> >>> Revving svn_diff_fns_t: what do you mean with parallelizing it? I must >>> admit that I don't really know (yet) how to go about that. Very early >>> during the branch work, danielsh pointed out that I modified this >>> public struct (vtable for reading data from datasources), so it should >>> be revved. Is it listed/documented somewhere what should be done for >>> that (community guide perhaps)? >>> >>> (one slightly related note to this: I introduced the function >>> datasources_open, to open the multiple datasources at once (as opposed >>> to the original datasource_open, which only opens one datasource). But >>> maybe the new function should be renamed to datasource_open_all or >>> datasources_open_all or something, to make it stand out a little bit >>> more). >>> >>>> I'm not trying to rush the work, nor do I think it is essential for >>>> 1.7, but it feels like a pretty good performance increase, and one >>>> that we shouldn't have any problem shipping. (If there are currently >>>> API uncertainties, than it would be good to wait until 1.7.x branches, >>>> though.) >>> >>> Yes, I think it's quite ready to be merged, and could provide a very >>> significant performance increase (for diff, merge and blame). >>> >>> The API is stable now, I think, except maybe for the name of the >>> datasources_open function (see above). If we decide to go (for >>> optimizations reasons) for specialized prefix/suffix scanning >>> functions for 2, 3 or 4 datasources, I think it's best to keep the >>> generic datasources_open API (with an array of datasources), and only >>> split up between specialized versions in the implementation. >> >> The API is now rev'd, and I caught the branch up with trunk in >> r1064459. So it looks like we're ready to merge! >> >> Johan, would you like to do the honors? > > Thanks. > > I'm not sure: shouldn't we wait for a little bit more review, or can > that happen after integration on trunk (or reviewing the reintegration > commit itself or something)? E.g. stefan2 said he was going to take a > look during his travel time next week.
Sounds good. > And one more thing came to mind with regards to the new API > (datasources_open function): currently it only supports up to 4 > datasources, so not an arbitrary number of datasources (the > implementation in diff_file.c assumes that anyway, because it has to > use some local array variables, so needs a fixed array length). I noticed these variables. It seems that we should get rid of the magic number ('4'), and replace it either with a sizeof() or a macro from somewhere. Not sure what the best approach is, but arbitrary numbers tickle my spidey sense. > I guess that's ok, because existing code in diff_file.c also already > assumes that (e.g. the svn_diff__file_baton_t struct, containing an > array of 4 file_info structs). And of course, there are currently no > know usages of that API with more than 4 datasources (diff, diff3 and > diff4). > > Should this restriction be documented in the docstring of > datasources_open withing svn_diff.h#svn_diff_fns2_t or something? Sounds reasonable. -Hyrum