On Tue, Jan 25, 2011 at 1:37 AM, Stefan Fuhrmann <eq...@web.de> wrote: [ ... snip ...] > And, as promised, here some ideas how to get more > speed from the generic code. Your latest commit: > > +#if SVN_UNALIGNED_ACCESS_IS_OK > + > + /* Skip quickly over the stuff between EOLs. */ > + for (i = 0, can_read_word = TRUE; i< file_len; i++) > + can_read_word = can_read_word > +&& (file[i].curp + sizeof(apr_size_t)< file[i].endp); > + while (can_read_word) > + { > + for (i = 1, is_match = TRUE; i< file_len; i++) > + is_match = is_match > +&& ( *(const apr_size_t *)file[0].curp > + == *(const apr_size_t *)file[i].curp); > + > + if (!is_match || contains_eol(*(const apr_size_t *)file[0].curp)) > + break; > + > + for (i = 0; i< file_len; i++) > + file[i].curp += sizeof(apr_size_t); > + for (i = 0, can_read_word = TRUE; i< file_len; i++) > + can_read_word = can_read_word > +&& (file[i].curp + sizeof(apr_size_t)< file[i].endp); > + } > + > +#endif > > could be changed to something like the following. > Please note that I haven't tested any of this:
Thanks. There was one error in your suggestion, which I found out after testing. See below. > /* Determine how far we may advance with chunky ops without reaching > * endp for any of the files. > * Signedness is important here if curp gets close to endp. > */ > apr_ssize_t max_delta = file[0].endp - file[0].curp - sizeof(apr_size_t); > for (i = 1; i< file_len; i++) > { > apr_ssize_t delta = file[i].endp - file[i].curp - sizeof(apr_size_t); > if (delta< max_delta) > max_delta = delta; > } > > /* the former while() loop */ > is_match = TRUE; > for (delta = 0; delta< max_delta&& is_match; delta += sizeof(apr_size_t)) > { > apr_size_t chunk = *(const apr_size_t *)(file[0].curp + delta); > if (contains_eol(chunk)) > break; > > for (i = 1; i< file_len; i++) > if (chunk != *(const apr_size_t *)(file[i].curp + delta)) > { > is_match = FALSE; Here, I inserted: delta -= sizeof(apr_size_t); because otherwise, delta will be increased too far (it will still be increased by the counting expression of the outer for-loop (after which it will stop because of !is_match)). Maybe there is a cleaner/clearer way to break out of the outer for-loop here, without incrementing delta again, but for now, I've committed it with this change (r1063565). > break; > } > } > > /* We either found a mismatch or an EOL at or shortly behind curp+delta > * or we cannot proceed with chunky ops without exceeding endp. > * In any way, everything up to curp + delta is equal and not an EOL. > */ > for (i = 0; i< file_len; i++) > file[i].curp += delta; Thanks. This gives on my machine/example another 15-20% performance increase (datasources_open time going down from ~21 s to ~17 s). We should probably do the same for suffix scanning, but I'm too tired right now :-) (and suffix scanning is more difficult to grok, so not a good idea to do at 3 am). Cheers, -- Johan