On 2018-11-15 18:18, Gene Heskett wrote:
On Thursday 15 November 2018 14:17:29 Austin S. Hemmelgarn wrote:
On 2018-11-15 13:36, Gene Heskett wrote:
On Thursday 15 November 2018 12:57:54 Austin S. Hemmelgarn wrote:
On 2018-11-15 11:53, Gene Heskett wrote:
On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote:
On 2018-11-15 06:16, Gene Heskett wrote:
I ask because after last nights run it showed one huge and 3
teeny level 0's for the 4 new dle's. So I just re-adjusted the
locations of some categories and broke the big one up into 2
pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5
new dle's.
But an estimate does not show the new names that results in.
I've even took the estimate assignment calcsize back out of the
global dumptype, which ack the manpage, forces the estimates to
be derived from a dummy run of tar, didn't help.
Clues? Having this info from an estimate query might take a
couple hours, but it sure would be helpfull when redesigning
ones dle's.I'm fairly certain you can't, because it specifically
shows server-side
estimates, which have no data to work from if there has never
been a dump run for the DLE.
Even if you told it to user tar for the estimate phase? That has
enough legs to be called a bug. IMO anyway.
As mentioned in one of my other responses, I can kind of see the
value in this not bothering the client systems. Keep in mind that
server estimates cost nothing on the client, while calcsize or
client estimates may use a significant amount of resources.
My default has been calcsize for three or 4 years, changed because
tar was changed & was screwing up the estimates. I can remember 15+
years ago when I was using real tar estimates, on a much smaller
machine, and it could come within 50 megabytes of filling a DDS-2
tape (4 GB compressed) for weeks at a time. So that part of amanda
worked a lot better than it does today. And its slowly gone to the
dogs as my system grew in complexity. And went in a handbasket when
I had to change to calcsize during the tar churn.
I've not been using AMANDA anywhere near as long as you have, but I've
actually not seen any issues with accuracy of 'estimate client' mode
estimates with current versions of GNU tar, except when the estimate
ran while data in the DLE was being modified (and in that case, it
makes sense that it would be bogus). I generally don't 'estimate
client' on my own systems though because it consistently takes far
longer than 'estimate calcsize', and I'm not picky about the estimates
being perfect.
In this case, I do think the documentation should be a bit clearer,
Yes, but who is to rewrite it? He should know a heck of a lot more
than I do about the amanda innards than I do even after 2 decades,
and better defined words here and there too. diakdevice is a very
poor substitute for the far more common slanguage of "/path/to/"
and it would be useful to be able to get regular (calcsize and/or
client) estimates on-demand, but I do think that the default is
reasonably sane.
It may well be sane, we'll see how it works in the morning. AIUI,
calcsize runs only on old history. so that should not impinge a load
on the client, even when the client is itself.
Unless I'm mistaken:
* 'estimate server' runs only on historical data, and doesn't even
talk to the client systems. It's good at limiting the impact the
estimate has on the client, but reliably gives bogus estimates if your
DLEs don't show consistent behavior (that is, each backup of a given
level is roughly the same size as every other backup at that level).
* 'estimate client' relies on the backup program being used to give it
info about how big it will be. It gives estimates that are close to
100% accurate, but currently essentially requires running the backup
process twice (once for the estimate, once for the actual backup) and
imposes a non-negligible amount of load on the client.
That depends on the clients instant duty's. I have backed up a milling
machine while it was running a 90 lines of code, 3 days to finish while
sharpening a saw blade, with no apparent interaction on a dual core atom
powered box. One core was locked away for LCNC, (isolcpus at work) the
other was free to do the backup client. Didn't bither it a bot. :)
* 'estimate calcsize' does something kind of in-between. AIUI, it
looks at some historical data, and also looks at the on-disk size of
the data,
That would take time to access the dle's, and the answer is effectively
instant, ergo it is not questioning the client(s), it has to be working
only from the history in its own logs.
Except that it actually runs on the client systems. I've actually
looked at this, the calcsize program is running on the clients and not
the server. It may be looking at the logs there, _but_ it's still
running on the client. It may also be _really_ fast in your setup, but
that doesn't inherently mean it's running locally (Amanda is smart
enough to spread out estimates across hosts and spindles just like it
does backups).
then factors in compression ratios and such to give an
estimate that's usually reasonably accurate without needing the DLEs
to be consistent or imposing significant load on the clients.
That compression is also in the logs, so calcsize can find that in a few
milliseconds.