Re: My take on the diff-optimizations-bytes branch

Stefan Fuhrmann Fri, 07 Jan 2011 21:51:30 -0800

On 03.01.2011 02:14, Johan Corveleyn wrote:

On Mon, Jan 3, 2011 at 12:13 AM, Johan Corveleyn<jcor...@gmail.com>  wrote:

On Sun, Jan 2, 2011 at 10:04 PM, Johan Corveleyn<jcor...@gmail.com>  wrote:

On Sat, Jan 1, 2011 at 11:25 PM, Stefan Fuhrmann
<stefanfuhrm...@alice-dsl.de>  wrote:

Hi Johan,


Thursday night I did something stupid and had a look at  how
svn blame could be made faster based on the HEAD code in
your branch.

One night and most of the following day later, I think I made it
a few percent faster now. Some of the code I committed directly
to /trunk and you need to pull these changes into your branch
to compile the attached patch.

Feel free to test it, take it apart and morph it any way you like --
I know the patch isn't very pretty in the first place. I tested the
patch on x64 LINUX and would like to hear whether it at least
somewhat improved performance on your system for your
svn blame config.xml use-case.

-- Stefan^2.

[[[
Speed-up of datasource_open() and its sub-routines
by a series of optimizations:

* allocate the file[] array on stack instead of heap
  (all members become directly addressible through
  the stack pointer because all static sub-functions
  will actually be in-lined)
* minor re-arragements in arithmetic expressions to
  maximize reuse of results (e.g. in INCREMENT_POINTERS)
* move hot spots to seperate functions and provide a
  specialized version for file_len == 2
  (many loops collapse to a single execution, other
  values can be hard-coded, etc.)
* use seperate loops to process data within a chunk
  so we don't need to call INCREMENT_POINTERS&  friends
  that frequently
* scan / compare strings in machine-word granularity
  instead of bytes
]]]

Hi Stefan,

Thanks for taking a look. I really appreciate it.

When I saw your first couple of commits, to the adler32 stuff, I
already thought: hmmm, he's up to something :-). And after I saw your
change to eol.c#svn_eol__find_eol_start, reading per machine-word, I
was thinking: hey, I could really use something like that in the
prefix/suffix scanning. Nice ... :-)

(I had some trouble applying your patch. It doesn't apply cleanly
anymore to HEAD of the branch (but most hunks were applied correctly,
and I could manually change the rest, so no problem)).

However, first tests are not so great. In fact, it's about 25% slower
(when blaming my settings.xml file with 2200 revisions, it's spending
about 90 seconds in diff, vs. around 72 with HEAD of
diff-optimizations-bytes).

Analyzing further, I can see it's spending significantly less time in
prefix/suffix scanning, but more time in token.c#svn_diff__get_tokens
(which is where the original algorithm gets the tokens/lines and
inserts them into the "token tree"). This tells me it's not working
correctly: either prefix/suffix scanning fails too soon, so there's
much more left for the regular algorithm. Or it's just not functioning
correctly.

Looking at the result of the blame operation, it seems it's the
latter. The final result of the blame is not correct anymore.

I'll try to look some more into it, to see what's going wrong. Maybe
you can also see it with a simple diff of a large file ... (for a good
example in svn's own codebase, you could try
subversion/tests/cmdline/merge-tests.py (pretty large, and almost 700
revisions)).

Ok, by eliminating parts of the new stuff, I found out the problem is
in scan_backward_fast (part of suffix scanning). If I bypass that, and
always take scan_backward_slow, it works.

And it's fast too! It's taking only 58 seconds in "diff", vs. 72 for
the normal version. Splitting that up further, it's taking only 28
seconds in prefix/suffix scanning, vs. ~47 for the normal version (for
some reason, the lcs processing took a little longer, but that's
probably unrelated (only did a single run)). And that's with
scan_backward_slow. That's pretty amazing.

The code of scan_backward_fast is pretty difficult to understand for
me. So I'm not sure if I can spot the problem. I'll continue staring
at it for a little while ...

Ok, I can't find it right now. There must be some functional
difference between scan_backward_fast and scan_backward_slow.

For now, some feedback on the rest of the patch:

[[[

-          DECREMENT_POINTERS(file, file_len, pool);
-        }
+        DECREMENT_POINTERS(file, file_len, pool);
      }

+  return SVN_NO_ERROR;
+}
+
+/* Find the prefix which is identical between all elements of the FILE array.
+ * Return the number of prefix lines in PREFIX_LINES.  REACHED_ONE_EOF will be
+ * set to TRUE if one of the FILEs reached its end while scanning prefix,
+ * i.e. at least one file consisted entirely of prefix.  Otherwise,
+ * REACHED_ONE_EOF is set to FALSE.
+ *
+ * After this function is finished, the buffers, chunks, curp's and endp's
+ * of the FILEs are set to point at the first byte after the prefix. */
+static svn_error_t *
+find_identical_prefix(svn_boolean_t *reached_one_eof, apr_off_t *prefix_lines,
+                      struct file_info file[], apr_size_t file_len,
+                      apr_pool_t *pool)
+{
+  if (file_len == 2)
+    SVN_ERR(find_identical_prefix_fast(reached_one_eof, prefix_lines,
+                                       file, pool));
+  else
+    SVN_ERR(find_identical_prefix_slow(reached_one_eof, prefix_lines,
+                                       file, file_len, pool));
+

Small problem here (not really visible in practice, but potential
change in behavior none the less): if either of the above function
calls exited early because of reached_all_eof, this function should
also exit early.

Yep, that is just an artifact caused by moving code
into a sub-function.

If both (all) files reached their end, they are completely identical,
and it makes no sense to roll back the last (incomplete) line.

So maybe the check for reached_all_eof should be pulled out of the
above functions, and into this one? But then you'll also need to move
the ended_at_nonmatching_newline stuff, so transfer the information of
had_cr. Maybe you've considered that already ...

The rest looks fine from a correctness standpoint (and tests
correctly), except for scan_backwards_fast.

Also, it would really help if you could split up this patch (though
this was certainly a good way to try it out and give me an idea of the
possibilities).

Giving you some ideas was the main point of it.

If I find some time, I will try to reduce the patch
(no fast / slow separation). But since the heap
to stack conversion spreads across all of the
code, it is hard to split the patch.

It would be interesting to see where the biggest gains
are coming from (I'm guessing from the "per-machine-word"
reading/comparing; I'd like to try that first, maybe together with the
allocating of the file[] on the stack; I'd like to avoid
special-casing file_len==2, splitting up functions in *_slow and
*_fast, because it makes it a lot more difficult imho (so I'd only
like to do that if it's a significant contributor)). But if you want
to leave that as an exercise for the reader, that's fine as well :-).

Exercise is certainly not a bad thing ;)

But I think, the stack variable is certainly helpful
and easy to do. While "chunky" operation gives
a *big* boost, it is much more difficult to code if
you need to compare multiple sources. It should
be possible, though.

The great advantage of the file_len==2 case is
that this is the only time-critical operation (a
true merge over long files is rare). Also, many
constructs collapse if the count is fixed to 2:

for (i = 1, is_match = TRUE; i<  file_len; i++)
  is_match = is_match&&  *file[0].curp == *file[i].curp;
while(is_match)
{
...
  for (i = 1, is_match = TRUE; i<  file_len; i++)
    is_match = is_match&&  *file[0].curp == *file[i].curp;
}

becomes

while(*file[0].curp == *file[1].curp)
{
}


But basically, we will need to run some more tests.

-- Stefan^2.

Re: My take on the diff-optimizations-bytes branch

Reply via email to