Wow, those are pretty neat invocations of find and awk. They also, as you allude to, add an extra stat of each file. My code/idea changes simply pick up the group information du gets for free when stating for file size and currently throws in the bit bucket. Adding a user option seems useful too as that info is also in the stat record. Efficiency is important, especially as storage density continues to outscale io throughput, iops, and compute.
Sent from my iPhone > On Jan 25, 2018, at 4:18 PM, Assaf Gordon <[email protected]> wrote: > > Hello Dan, > > Expanding on Eric's comments: > >> On Thu, Jan 25, 2018 at 02:42:32PM -0600, Eric Blake wrote: >>> On 01/25/2018 12:11 PM, Daniel Gall wrote: >>> coreutils-8.26> !diff >> >> We prefer 'git diff' output against the latest coreutils.git, >> but any program which can produce unified diffs (diff -u) is better than >> an ed script diff. > > Good starting points are here: > https://git.savannah.gnu.org/cgit/coreutils.git/tree/README-hacking > https://git.savannah.gnu.org/cgit/coreutils.git/tree/HACKING > https://git.savannah.gnu.org/cgit/coreutils.git/tree/.github/PULL_REQUEST_TEMPLATE.txt > >> A feature addition requires documentation, NEWS update, and preferably >> testsuite additions to be complete > > A typical example of these required changes is here: > https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=57dea5ed07471b2192cc5edf08993e663a3f6802 > > > > Additionally, a work-around would be to combine several existing programs > to get approximately similar information: > > First, use `find` to print the size (%s) and group (%g) of each > file/directory: > > $ find /home -printf "%g %s\n" > root 4096 /home > gordon 4096 /home/gordon > gordon 59 /home/gordon/.Xauthority > gordon 4096 /home/gordon/.cache > gordon 4096 /home/gordon/.cache/RStudio > ... > > Then, use `awk` to sum up the sizes per group: > > $ find /home -printf "%g %s\n" \ > | awk '{a[$1] += $2} END {for(i in a) { print a[i],i }}' > 1044086087 gordon > 542342 mike > 4123 root > > And optionally, use `numfmt` to print human sizes: > > $ find /home -printf "%g %s\n" \ > | awk '{a[$1] += $2} END {for(i in a) { print a[i],i }}' \ > | numfmt --to=iec > 997M gordon > 530K mike > 4.1K root > > > The above commands are rather naive, counting hard-links as many times > as they appear (similar to 'du -l'), and showing the apparent size > instead of allocated blocks (similar to 'du --apparent-size'). > > To show allocated blocks, replace '%s' with '%k'. > > To count hardlinked files just once, print the device(%D) and inode number > (%i) of > each file, then use 'sort -u' to keep only one of each: > > find /home -printf "%g %s %D %i\n" \ > | sort -k3n,3 -k4n,4 -u \ > | awk '{a[$1] += $2} END {for(i in a) { print a[i],i }}' \ > | numfmt --to=iec > > This isn't as efficient as 'du', but could be used with existing programs > without code modifications (and using find's many predicates allows > fine-tuning > of the summaries, e.g. per-user, per-user-and-group, etc.). > > regards, > - assaf > >
