Hello,
On Wed, 8 May 2024 11:20:47 +0200 Hendrik Jaeger
<[email protected]> wrote:
Package: puppet-agent
Version: 7.23.0-1
Severity: minor
File: /usr/bin/puppet
X-Debbugs-Cc: [email protected]
Dear Maintainer,
* What led up to the situation?
I was trying to build an exclude list for my backups and went through the
content of my filesystems.
* What was the outcome of this action?
I noticed that there are reports of puppet runs in /var/cache/puppet/reports.
* What outcome did you expect instead?
I did expect all data in /var/cache and its subdirectories to be regeneratable
and not contain any information one might want to backup.
According to the FHS in
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s05.
> /var/cache is intended for cached data from applications. Such data is
locally generated as a result of time-consuming I/O or calculation. The
application must be able to regenerate or restore the data.
This is not the case for reports:
Puppet can not regenerate the report for a specific run.
Also "cache" usually refers to data that will be reused which is not the case
for these reports.
/var/log seems a better fit for those.
In my concrete case, it seems suboptimal that these reports are in a directory
that I would like to exclude from backups because it should not contain
anything worth backing up anyway as all data in there is supposed to be
regeneratable and these reports clearly are not.
Under the "Rationale" this use case is even mentioned explicitly:
> The existence of a separate directory for cached data allows system
administrators to set different disk and backup policies from other directories in
/var.
The argument has been made on IRC that usually reports are not stored locally anyway, but
it seemed implied that the server would also store the reports in a directory named
"cache", but outside the FHS in /opt/puppetlabs/puppet/cache/reports in the
case of a non-debian installation. I have no puppetserver installation with debian on
hand, so I don’t know how the debian package would behave.
Another argument has been made that the reports are stored in puppetdb and the reports are thus
only stored temporarily as files on a disk. IMHO that still wouldn’t make them "cache"
data. "temporary" data maybe, so in that case they should probably go to /var/tmp or /tmp.
Or, as https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s14.html mentions:
> /var/spool contains data which is awaiting some kind of later processing.
Data in /var/spool represents work to be done in the future (by a program, user,
or administrator); often data is deleted after it has been processed.
Both of these arguments are kind of OK for a certain set of circumstances but
not everybody is running a puppetdb or even a puppetserver. I am running puppet
standalone, i.e. with `puppet apply`, so the reports will not be transferred to
the server and will not be consumed into/by puppetdb.
In any case, treating reports as "cached" data seems quite clearly wrong.
In the case of standalone puppet (i.e. `puppet apply`) IMHO they are "logs" and
should go to /var/log.
In the case of a puppet-agent (i.e. a puppet client/agent connecting to a puppet server _without_ a
puppetdb), they should probably not be saved on the client at all but if so, they are also
"logs" IMHO and should be treated like mentioned above. On the server, they should also
be treated like "logs" but not necessarily go to /var/log like machine-local log data. I
don’t think I have a concrete sensible suggestion for this case. Maybe /var/lib.
In the case of a puppetserver with a puppetdb, they should probably not be saved as files
at all on the server. Unless they are sent directly to the puppetdb from the
puppedserver, but consumed later, they are probably "spool" data.
I agree perhaps the default of "/var/cache/puppet/reports" isn't ideal.
But instead of changing only "reportdir", we might want to instead
change "vardir" from "/var/cache/puppet" to something like
"/var/puppet". I'm not sure that anything puppet puts inside "vardir"
can really be qualified as "cache"?
I think perhaps the only reason it's that way is because of the naming
choices made by upstream a long time ago.
-- Jérôme