On 05/13/2018 01:57 AM, Devin Hussey wrote:
> Subject: [PATCH] Significant performance improvements (at least on macOS).
> find and locate:
> -> Reordered struct members to reduce padding and slightly reduce
> memory usage.
> -> visit_locate02_format
> 1. Forced Gnulib to use its implementation, it is much faster,
> especially when compared to macOS.
> 2. The previous implementation would cause getdelim() to realloc
> and then have that memory immediately freed once per file.
> - The buffer to be passed to getdelim() is now in a top-level
> static, with a respective size field.
> - getdelim() takes a minimum size by ref and will automatically
> resize the buffer when needed, updating that value with its
> new size.
> - By storing this and merely memsetting it to 0, we can
> recycle it.
> - visit_substring_match_nocasefold_buffer
> 1. By default, I replaced the strstr implementation with the
> fast_strstr implementation by Raphael Javaux (BSD 3-Clause,
> - It can be toggled in the configure script. I couldn't
> submodule it because it was not compatible with c90
> (it had // comments and mixed variable declarations),
> so I had to edit it slightly.
> - I also used "restrict" on the declaration, which recommends
> optimizations. Autoconf should gracefully no-op it on
> incompatible compilers.
> - Feel free to remove this, but from my tests, this can shave
> a good half second off the test below.
> 2. I also forced enabling the Gnulib memchr(), as it is also
> much faster than the system implementation if you are using
> Gnulib's strstr.
> before: locate -c file: 3.99s
> after: locate -c file: 2.19s
Thanks for your work so far.
However, I have some qualms:
First of all, performance improvements should *never* be done in
one patch, because the effect of one change might influence or
even null-ify the other. There was some steep learning-curve recently
on the gnulib mailing list starting here  with more than
Second, it seems you only measured on one platform, MacOS. A
general-purpose project like GNU findutils supports many platforms,
so the measurements at least have to proof that a change would not
lead to performance penalties on the other platforms.
Someone said: "premature optimization is the source of all evil".
Regarding a different implementation for strstr(): if there's really
faster implementation than that provided by the system, then IMO that
should go there; in this case into MacOS' system libraries ... or if
it's general enough for more platforms, then gnulib may be an alternative
Furthermore, I'm unsure yet whether the source code under the
"open-source BSD3 license" can be incorporated into a GNU project.
WRT the realloc change: this is IMO actually something to consider,
still in a separate patch (with measurement results from various
Finally, the changes are big enough that they require to undergo
the legal procedure for the copyright assignment to the FSF.
Have a nice day,