John R. Jackson wrote:
> >Even better: run the dumps till they start outputting data. Stall the
> >output pipe until the data is needed.
> >
> >That way the "estimation" step of dump that actually happens in the
> >end is not repeated for the eventual dump.
>
> That's an interesting idea, but would have several implementation
> problems. For one, Amanda runs one, two or three estimates of each file
> system, so even on a fairly small system you're talking about dozens
> of dump processes hanging around, possibly more than are allowed for a
> single user. On several of my systems we would be talking hundreds of
> processes (although they would mostly be asleep).
Hmmm. I was thinking about killing the "level 1" estimator once you
start on the "level 2", but indeed, we could end up doing the level 1
in the end, because the difference between the dump sizes is too
small. Right you are! (I didn't know that this feature was available
until this morning I was fiddling with our amanda config...)
> They would have to stay there until the planner stage completes, at which
> time we know which one or two (in case we have to drop back to degraded
> mode) will be used during the run, then any remaining could be killed.
> The data cannot just be stalled on the client. Dump outputs several
> blocks before the estimated size line and those would have to be saved.
> Then when the real dump is requested, those blocks would have to be
> sent along to the new socket connected to a dumper and the rest of the
> data flow would also have to be collected and sent along (much like it
> does now).
OK. That means you have a "process in the middle" that saves stdin to
a file and only starts outputting to stdout once it gets a signal.
void interrupt (int signum)
{
interrupted = 1;
}
main (...)
{
signal (SIGUSR1, interrupt);
fd = mkstmp (...);
while (!interrupted) {
n = read (0, buf, BUFSIZE);
if (n > 0)
write (fd, buf, n);
}
lseek (fd, 0, SEEK_SET);
while ((n = read (fd, buf, BUFSIZE)) > 0) {
write (1, buf, n);
}
while ((n = read (0, buf, BUFSIZE)) > 0) {
write (1, buf, n);
}
}
Send it a "sigusr1" to signal "ok, it's your turn to output the data",
and send it a "SIGSTOP" to make it block the pipeline. Send it an
"SIGINT" to make it abort the pipe....
> We would also probably need some kind of keep alive since it could be
> hours between the start of dump and the actual data motion.
> It would also have to allow for a dump to be aborted and restarted,
> which happens during direct to tape when an error is detected.
The restart would simply be done the "old fashioned" way, right?
> Finally, this only applies to dump. GNU tar does its estimates in
> a completely different way (it literally does the dump to /dev/null,
> with an efficiency shortstop that skips any real I/O, and just outputs
> the total size when it is completely done) and would not gain anything
> with these changes.
Oh yeah. That nice feature about "tar" that it only stats the files,
and then doesn't read the data. Brilliant.
tar cvf /dev/null /cdrom
to verify if all data-blocks on the CDROM are readable. Fooled you!
This is a feature that -=cannot=- be allowed to be on by default!
(It is, I know. )-;
> As I said, it's an interesting idea and worth some more thought, but
> will take significant changes to the way Amanda does things (which is
> not necessarily a bad thing).
--
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.