Re: cruft(-ng) and dh-cruft: handling and registering of dynamic files

2022-11-05 Thread Alexandre Detiste
Hi,

Le dim. 23 oct. 2022 à 04:24, Paul Wise  a écrit :
> Thank you for your work on this, being able to register files generated
> at install time by maintainer scripts or even at runtime by system
> maintainence tools to particular packages is a very useful feature for
> keeping all the files on a system more easily managed.

The "cpigs" command has now a new "-C" command line switch
to output the ownership of all system files (static+volatile) in a single .csv.

I think this is something quite basic that can fill so many needs;
but simply did not existed before.

"apt-file" could be adapted to also transparently cache this information.
End-users of this tool would get better results without
having to change their habits.

$ apt-file search /etc/subgid
[nothing]
$ cpigs -c | grep subgid
/etc/subgid;base-passwd;f;1;19
/etc/subgid-;base-passwd;f;1;0

The plan is to keep this .csv output stable,
whatever changes in the upstream dataflow:
which is now mix of dpkg + alternatives
+ custom fallback scripts that know and replicate how
UCF, logrotate, initramfs, grub, systemd, sysvinit
manage volatile files inside their postinst/postrm.

> I do worry about users removing files that they don't understand, based
> on feedback by cpigs/cruft-ng, but they do that already so... :)

I have seen some complaints about this online, and I agree...
original "cruft" tool looks more like an unfinished Q tool akin to piuparts
than an end-user tool for me.

> An ncdu or mc style interface (or plugins for those) to view cruft on a
> system sounds very useful in addition to the data export.

It's implemented but the ncdu datamodel does not allow
to insert the matched package name for the volatile files.

It's still nice to use if you need to quickly identify
where are the big volatile files piling up and take action.
Already done in real life.

Greetings



Re: cruft(-ng) and dh-cruft: handling and registering of dynamic files

2022-10-22 Thread Paul Wise
On Sun, 2022-10-23 at 01:08 +0200, Alexandre Detiste wrote:

> This DebHelper works this way:
> * the "debian/cruft" list merely register the glob patterns,
> * and "debian/purge" list also an "rm -rf" stanza in postrm/purge.
> 
> As a bonus there's now also a new "cpigs" command, working akin to
> "dpigs" from Debian Goodies to list the biggest volatile data producers.

Thank you for your work on this, being able to register files generated
at install time by maintainer scripts or even at runtime by system
maintainence tools to particular packages is a very useful feature for
keeping all the files on a system more easily managed.

Potentially it could also prompt users before removing packages that
have registered data that won't be removed on purge, for example if a
package creates at the sysadmin's request a dir in /srv to host a
website, removing the package could warn about the directory. Or
removing postgres with databases present could warn about those.

I do worry about users removing files that they don't understand, based
on feedback by cpigs/cruft-ng, but they do that already so... :)

> The plan now is to have a new option that dumps the whole
> matching result database as .json with individual file size
> for jq consumption or in my case Jupyter;
> this instead of implementing older requests (#291823 #487458 #527285).

An ncdu or mc style interface (or plugins for those) to view cruft on a
system sounds very useful in addition to the data export.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


cruft(-ng) and dh-cruft: handling and registering of dynamic files

2022-10-22 Thread Alexandre Detiste
Hi,

I had been working on the cruft/cruft-ng package since 2014;
there where a few setbacks along the years,
like mlocate -> plocate & UsrMerge transitions,
but it's alive and kicking, helping to find random
lost files left behind by other packages
and file bugs against those from time to time
to get these glitches resolved.



Recently I've been working a lot on it because I realized
it would be the perfect solution to audit the disk space
usage problems I'm facing at work.

So I somewhat whipped up what I remembered from my own proposal
https://wiki.debian.org/Cruft/purge and have now for myself a working
"dh-cruft" than I can use to register dynamic files
owned by some private .deb. Here "dh-cruft" is a must, I don't want to
polute Debian with some random external data from downstream.

This DebHelper works this way:
* the "debian/cruft" list merely register the glob patterns,
* and "debian/purge" list also an "rm -rf" stanza in postrm/purge.

As a bonus there's now also a new "cpigs" command, working akin to
"dpigs" from Debian Goodies to list the biggest volatile data producers.


The plan now is to have a new option that dumps the whole
matching result database as .json with individual file size
for jq consumption or in my case Jupyter;
this instead of implementing older requests (#291823 #487458 #527285).


I know it's a very old unresolved subject that has been lurking forever
here, but maybe it's the right time to look it up with a fresh view.

My proposal for next steps:µ
  * gather your comments here
  * some review of dh-cruft (I don't know Perl)
  * get it in the NEW queue soon
  * have interested packages take part;
for now cruft-ng ship it's own homegrown fallback database
  * (later): merge dh-cruft into DebHelper when it's basically "done"
  * (much much later): migrate some logic from DH to dpkg itself,
with a more declarative packaging style;
cruft-ng is already linked with the static library libdpkg
and is bound to progress at the same pace.

  * there is still a performance problem in cruft-ng that I wish to improve.
Basic profiling can be done by setting ELAPSED=1 env var.

Greetings,

Alexandre Detiste


./cpigs 30
496720816 apt
68957680 npm
61846660 linux-image-5.19.0-1-amd64 (the initrd)
61787431 linux-image-5.19.0-2-amd64
53131401 dlocate
36229735 aptitude
19621198 dpkg
17896745 plocate
13559874 jupyter-nbextension-jupyter-js-widgets
11982526 udev
11870208 openjdk-11-jre-headless
7257544 debconf
5704857 smartmontools
5685370 ttf-mscorefonts-installer
5086033 linux-image-5.18.0-4-amd64 -> rc state
4933502 grub-common
3550208 qgis
3523931 fontconfig
3421312 ucf
3231839 shared-mime-info
3063016 locales
2266947 libreoffice-common   (files seen from explain/ucf)
1901483 grub-pc-bin
1565651 logrotate
1258042 man-db
1107968 ALTERNATIVES (I thought these were only symlinks ?)
783313 popularity-contest
763776 unattended-upgrades   (du -b /var/log/unattended-upgrades/760422)
657496 breeze-icon-theme
625345 PYEXCEL(some pip3 automation)


pgplDWk0_S4Hw.pgp
Description: OpenPGP digital signature