On Sat, 2023-03-04 at 14:58 -0800, Paul Eggert wrote: > What's the motivation here? Does this have something to do with > reproducible builds?
No, nothing with reproducibility - at least not from my side. It's really just to get a number for the "actual" data. And yes it's clear that one can argue what that actually is ;-) ... but at least I think it should give the same totals for the same files (of any type) on any filesystem. > One possibility is for --apparent-size to always count 0 for > directories, since 'read' never returns a positive number on > directories. That is, we reinterpret --apparent-size to mean "bytes > that > could be read" rather than "what st_size says". Sounds like having a good potential for breaking existing stuff. And in a way solve the fundamental problem only partially: As said above, it's not even clear what "actual" or "pristine" data should actually be. I would say that it's at least independent of any underlying structures (like meta data of a filesystem or e.g. header data in a tar archive). But would symlinks (i.e. their length) count for it? What about hardlinked files, would they count once or n times? du already allows to select what it should do for hard links (-l) so I figured it would fit conceptually if it would allow the same for file types. E.g. with a --type option that takes a string of (1-n) letter like find: b block (buffered) special c character (unbuffered) special d directory p named pipe (FIFO) f regular file l symbolic link s socket D door (Solaris) If --type is given only the files with letters are counted (but it has no effect on whether such files are followed or recursed into (in the case of d or l). But anyway... as said previously... I already have my script that does more or less what I want. So if you think the whole idea is overkill for du, then don't hesitate to close as wontfix. Cheers, Chris.