Hi all, On the diff-optimizations-bytes branch, in diff_file.c, there are two functions which are called for every byte of the identical prefix and suffix: increment_pointers and decrement_pointers. These functions are actually equivalents of curp++ or curp--, reading the next/previous byte, but taking into account the chunked-ness of the data (file data is read in memory in chunks).
As an experiment I changed these functions into macro's, eliminating the function calls. This makes the diff algorithm another 10% - 15% faster (granted, this was measured with my "extreme" testcase of a 1,5 Mb file (60000 lines), of which most lines are identical prefix/suffix). However, having an entire function like that implemented as a macro isn't very pretty (see below for an example). Some considerations: - Maybe I can use APR_INLINE, with similar results? - Maybe I can put just the "critical section" into a macro (the curp++ / curp-- part), and call a function when a chunk boundary is encountered (~ once every 131072 iterations (chunks are 128 Kb large)), to read in the new chunk, advancing variables, ... - Maybe it's not worth it? Thoughts? Just for kicks, here is an example of increment_pointers as a macro: [[[ #define increment_pointers(afile, file_len, pool) { \ int i; \ \ for (i = 0; i < file_len; i++) \ if (afile[i]->chunk == -1) /* indicates before beginning of file */ \ { \ afile[i]->chunk = 0; /* point to beginning of file again */ \ } \ else if (afile[i]->curp == afile[i]->endp - 1) \ { \ apr_off_t last_chunk = offset_to_chunk(afile[i]->size); \ if (afile[i]->chunk == last_chunk) \ { \ afile[i]->curp++; /* curp == endp signals end of file */ \ } \ else \ { \ apr_off_t length; \ afile[i]->chunk++; \ length = afile[i]->chunk == last_chunk ? \ offset_in_chunk(afile[i]->size) : CHUNK_SIZE; \ SVN_ERR(read_chunk(afile[i]->file, afile[i]->path, afile[i]->buffer,\ length, chunk_to_offset(afile[i]->chunk), \ pool)); \ afile[i]->endp = afile[i]->buffer + length; \ afile[i]->curp = afile[i]->buffer; \ } \ } \ else \ { \ afile[i]->curp++; \ } \ } ]]] Cheers, -- Johan