On 2025-08-04 1:22 PM, Denys Vlasenko wrote:
On Tue, Apr 16, 2024 at 4:10 AM Jody Bruchon <[email protected]> wrote:
This revision reduces code size in concat_path_file_fast() by 19 bytes. It also 
applies the new get_d_namlen() optimization to a readdir() loop in 
runit/svlogd.c.

If there are any changes that need to be made for inclusion then please let me 
know as soon as possible. I'd like to finish this project up. If the old/slow 
and new/fast code should be chosen with a compile-time config option then I'm 
happy to do that as well.
+#elif defined _DIRENT_HAVE_D_RECLEN
+    const size_t base = (sizeof(struct dirent) - sizeof(((struct
dirent *)0)->d_name)) - offsetof(struct dirent, d_name) - 1;
+    size_t skip;
+
+    skip = dirent->d_reclen - (sizeof(struct dirent) -
sizeof(((struct dirent *)0)->d_name));
+    if (skip > 0) skip -= base;
+    return skip + strlen(dirent->d_name + skip);

What is it doing? (Probably needs a comment).
Shouldn't this be just

+       const size_t prefix_sz = offsetof(struct dirent, d_name);
+       return de->d_reclen - prefix_sz;

?

d_reclen is the allocation size. If you do a simple offsetof() calculation, you'll very frequently count 1-7 bytes of garbage beyond the null terminator. The math finds the last aligned chunk, then only does strlen() on that chunk, skipping the work for every chunk between the start of d_name (which itself sadly starts at a 32-bit shifted offset) and the final chunk.

+    const char *filename = dirp->d_name;
...
+    while (*filename == '/') {
+        filename++;
+        namelen--;
+    }

d_name's never contain slashes. It's not allowed in unix filenames.

concat_path_file is not only used to touch d_names. If concat_path_file doesn't need this, then some logic needs removal in the long-time existing code:

char* FAST_FUNC concat_path_file(const char *path, const char *filename)
{
        char *lc;

        if (!path)
                path = "";
        lc = last_char_is(path, '/');
        while (*filename == '/')
                filename++;
        return xasprintf("%s%s%s", path, (lc==NULL ? "/" : ""), filename);
}

+    buf = (char *)malloc(end_offset + 1);
+    if (!buf) return NULL;

Use xmalloc.

Fair. Sorry.

+    if (dirp->d_name && DOT_OR_DOTDOT(dirp->d_name))
+        return NULL;

dirp->d_name is never NULL, don't check for that.

True. I can remove that part. I tend to code defensively, but I see that check's lack of value here.

But the biggest problem is this: I don't see much improvement
in the time it takes to copy data.

Here I copy a Linux kernel tree, 1450M mbytes, 89679 files, on tmpfs
("ramdisk"):

$ time ./busybox_old cp -a linux linux1; rm -rf linux1; time ./busybox
cp -a linux linux1; rm -rf linux1
real    0m10.464s
user    0m1.067s   <<< almost the same
sys    0m8.558s
real    0m11.070s
user    0m0.959s   <<< almost the same
sys    0m9.204s

and the time, even on tmpfs, is heavily dominated by actual copying,
not constructing filenames.
I ran cachegrind, not time. The time command measures everything, not just the changes. Also, you didn't sync or drop_caches between steps, so the second time command is likely running in parallel to the async write (even in tmpfs).

Also also, cp is one of the weakest beneficiaries of the new code, as seen in the cachegrind stats, so choosing that instead of find, du, or ls doesn't seem like giving the code a fair shake, especially since cp in particular is heavily I/O bound (even in a tmpfs). The find command went from 10.7M insns to 4.2M insns (60.7% less), the total data reads dropped by 48.5%, and the total data writes dropped by 50.8%, so why didn't you try out find instead? Or perhaps du, which has similarly huge cuts in insns/reads/writes across the board? It's double (or more) the metadata processing speed for 255 extra bytes, and I even left the original code with comment slashes so a build toggle can still be added for anyone who remains skeptical.

Given than this patch has been available for almost two years with minimal discussion, I'd like to either see it merged (possibly with a V3 where I fix the outlined issues) or see it finally rejected so I can maintain it as a publicly available out-of-band patch instead. I would like for others to benefit from this as widely as possible. Please let me know which path you'd prefer I follow.

- Jody
_______________________________________________
busybox mailing list
[email protected]
https://lists.busybox.net/mailman/listinfo/busybox

Reply via email to