Re: Du feature request - group reporting

Daniel Gall Thu, 25 Jan 2018 14:23:22 -0800

Wow, those are pretty neat invocations of find and awk.  They also, as you 
allude to, add an extra stat of each file.  My code/idea changes simply pick up 
the group information du gets for free when stating for file size and currently 
throws in the bit bucket. Adding a user option seems useful too as that info is 
also in the stat record. Efficiency is important, especially as storage density 
continues to outscale io throughput, iops, and compute.


Sent from my iPhone

> On Jan 25, 2018, at 4:18 PM, Assaf Gordon <[email protected]> wrote:
> 
> Hello Dan,
> 
> Expanding on Eric's comments:
> 
>> On Thu, Jan 25, 2018 at 02:42:32PM -0600, Eric Blake wrote:
>>> On 01/25/2018 12:11 PM, Daniel Gall wrote:
>>> coreutils-8.26> !diff
>> 
>> We prefer 'git diff' output against the latest coreutils.git,
>> but any program which can produce unified diffs (diff -u) is better than
>> an ed script diff.
> 
> Good starting points are here:
> https://git.savannah.gnu.org/cgit/coreutils.git/tree/README-hacking
> https://git.savannah.gnu.org/cgit/coreutils.git/tree/HACKING
> https://git.savannah.gnu.org/cgit/coreutils.git/tree/.github/PULL_REQUEST_TEMPLATE.txt
> 
>> A feature addition requires documentation, NEWS update, and preferably
>> testsuite additions to be complete 
> 
> A typical example of these required changes is here:
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=57dea5ed07471b2192cc5edf08993e663a3f6802
> 
> 
> 
> Additionally, a work-around would be to combine several existing programs
> to get approximately similar information:
> 
> First, use `find` to print the size (%s) and group (%g) of each 
> file/directory:
> 
>  $ find /home -printf "%g %s\n"
>  root    4096    /home
>  gordon  4096    /home/gordon
>  gordon  59    /home/gordon/.Xauthority
>  gordon  4096    /home/gordon/.cache
>  gordon  4096    /home/gordon/.cache/RStudio
>  ...
> 
> Then, use `awk` to sum up the sizes per group:
> 
>  $ find /home -printf "%g %s\n" \
>       | awk '{a[$1] += $2} END {for(i in a) { print a[i],i }}'
>  1044086087 gordon
>  542342 mike
>  4123 root
> 
> And optionally, use `numfmt` to print human sizes:
> 
>  $ find /home -printf "%g %s\n" \
>       | awk '{a[$1] += $2} END {for(i in a) { print a[i],i }}' \
>       | numfmt --to=iec
>  997M gordon
>  530K mike
>  4.1K root
> 
> 
> The above commands are rather naive, counting hard-links as many times
> as they appear (similar to 'du -l'), and showing the apparent size
> instead of allocated blocks (similar to 'du --apparent-size').
> 
> To show allocated blocks, replace '%s' with '%k'.
> 
> To count hardlinked files just once, print the device(%D) and inode number 
> (%i) of
> each file, then use 'sort -u' to keep only one of each:
> 
>  find /home -printf "%g %s %D %i\n" \
>    | sort -k3n,3 -k4n,4 -u \
>    | awk '{a[$1] += $2} END {for(i in a) { print a[i],i }}' \
>    | numfmt --to=iec
> 
> This isn't as efficient as 'du', but could be used with existing programs
> without code modifications (and using find's many predicates allows 
> fine-tuning
> of the summaries, e.g. per-user, per-user-and-group, etc.).
> 
> regards,
> - assaf
> 
>

Re: Du feature request - group reporting

Reply via email to