Re: [PATCH v2] Huge performance boost for recursion (cp, du, find, ls, rm, mv)

Jody Bruchon via busybox Wed, 01 Apr 2026 11:29:24 -0700

On 2025-08-04 1:22 PM, Denys Vlasenko wrote:

On Tue, Apr 16, 2024 at 4:10 AM Jody Bruchon <[email protected]> wrote:

This revision reduces code size in concat_path_file_fast() by 19 bytes. It also 
applies the new get_d_namlen() optimization to a readdir() loop in 
runit/svlogd.c.


If there are any changes that need to be made for inclusion then please let me 
know as soon as possible. I'd like to finish this project up. If the old/slow 
and new/fast code should be chosen with a compile-time config option then I'm 
happy to do that as well.

+#elif defined _DIRENT_HAVE_D_RECLEN
+    const size_t base = (sizeof(struct dirent) - sizeof(((struct
dirent *)0)->d_name)) - offsetof(struct dirent, d_name) - 1;
+    size_t skip;
+
+    skip = dirent->d_reclen - (sizeof(struct dirent) -
sizeof(((struct dirent *)0)->d_name));
+    if (skip > 0) skip -= base;
+    return skip + strlen(dirent->d_name + skip);

What is it doing? (Probably needs a comment).
Shouldn't this be just

+       const size_t prefix_sz = offsetof(struct dirent, d_name);
+       return de->d_reclen - prefix_sz;

?

d_reclen is the allocation size. If you do a simple offsetof()calculation, you'll very frequently count 1-7 bytes of garbage beyondthe null terminator. The math finds the last aligned chunk, then onlydoes strlen() on that chunk, skipping the work for every chunk betweenthe start of d_name (which itself sadly starts at a 32-bit shiftedoffset) and the final chunk.

+    const char *filename = dirp->d_name;
...
+    while (*filename == '/') {
+        filename++;
+        namelen--;
+    }

d_name's never contain slashes. It's not allowed in unix filenames.

concat_path_file is not only used to touch d_names. If concat_path_filedoesn't need this, then some logic needs removal in the long-timeexisting code:


char* FAST_FUNC concat_path_file(const char *path, const char *filename)
{
        char *lc;

        if (!path)
                path = "";
        lc = last_char_is(path, '/');
        while (*filename == '/')
                filename++;
        return xasprintf("%s%s%s", path, (lc==NULL ? "/" : ""), filename);
}

+    buf = (char *)malloc(end_offset + 1);
+    if (!buf) return NULL;

Use xmalloc.


Fair. Sorry.

+    if (dirp->d_name && DOT_OR_DOTDOT(dirp->d_name))
+        return NULL;

dirp->d_name is never NULL, don't check for that.

True. I can remove that part. I tend to code defensively, but I see thatcheck's lack of value here.

But the biggest problem is this: I don't see much improvement
in the time it takes to copy data.

Here I copy a Linux kernel tree, 1450M mbytes, 89679 files, on tmpfs
("ramdisk"):

$ time ./busybox_old cp -a linux linux1; rm -rf linux1; time ./busybox
cp -a linux linux1; rm -rf linux1
real    0m10.464s
user    0m1.067s   <<< almost the same
sys    0m8.558s
real    0m11.070s
user    0m0.959s   <<< almost the same
sys    0m9.204s

and the time, even on tmpfs, is heavily dominated by actual copying,
not constructing filenames.

I ran cachegrind, not time. The time command measures everything, notjust the changes. Also, you didn't sync or drop_caches between steps, sothe second time command is likely running in parallel to the async write(even in tmpfs).

Also also, cp is one of the weakest beneficiaries of the new code, asseen in the cachegrind stats, so choosing that instead of find, du, orls doesn't seem like giving the code a fair shake, especially since cpin particular is heavily I/O bound (even in a tmpfs). The find commandwent from 10.7M insns to 4.2M insns (60.7% less), the total data readsdropped by 48.5%, and the total data writes dropped by 50.8%, so whydidn't you try out find instead? Or perhaps du, which has similarly hugecuts in insns/reads/writes across the board? It's double (or more) themetadata processing speed for 255 extra bytes, and I even left theoriginal code with comment slashes so a build toggle can still be addedfor anyone who remains skeptical.

Given than this patch has been available for almost two years withminimal discussion, I'd like to either see it merged (possibly with a V3where I fix the outlined issues) or see it finally rejected so I canmaintain it as a publicly available out-of-band patch instead. I wouldlike for others to benefit from this as widely as possible. Please letme know which path you'd prefer I follow.


- Jody
_______________________________________________
busybox mailing list
[email protected]
https://lists.busybox.net/mailman/listinfo/busybox

Re: [PATCH v2] Huge performance boost for recursion (cp, du, find, ls, rm, mv)

Reply via email to