Re: estimate problem.

Rogier Wolff Thu, 04 Jan 2001 15:12:24 -0800
John R. Jackson wrote:
> >Even better: run the dumps till they start outputting data. Stall the
> >output pipe until the data is needed.
> >
> >That way the "estimation" step of dump that actually happens in the
> >end is not repeated for the eventual dump.
> 
> That's an interesting idea, but would have several implementation
> problems.  For one, Amanda runs one, two or three estimates of each file
> system, so even on a fairly small system you're talking about dozens
> of dump processes hanging around, possibly more than are allowed for a
> single user.  On several of my systems we would be talking hundreds of
> processes (although they would mostly be asleep).

Hmmm. I was thinking about killing the "level 1" estimator once you
start on the "level 2", but indeed, we could end up doing the level 1
in the end, because the difference between the dump sizes is too
small. Right you are! (I didn't know that this feature was available
until this morning I was fiddling with our amanda config...)

> They would have to stay there until the planner stage completes, at which
> time we know which one or two (in case we have to drop back to degraded
> mode) will be used during the run, then any remaining could be killed.

> The data cannot just be stalled on the client.  Dump outputs several
> blocks before the estimated size line and those would have to be saved.
> Then when the real dump is requested, those blocks would have to be
> sent along to the new socket connected to a dumper and the rest of the
> data flow would also have to be collected and sent along (much like it
> does now).

OK. That means you have a "process in the middle" that saves stdin to
a file and only starts outputting to stdout once it gets a signal.


void interrupt (int signum) 
{
  interrupted = 1;
}

main (...)
{
  signal (SIGUSR1, interrupt);
  fd = mkstmp (...);
  while (!interrupted) {
    n = read (0, buf, BUFSIZE); 
    if (n > 0)
       write (fd, buf, n);
  }
  lseek (fd, 0, SEEK_SET); 
  while ((n = read (fd, buf, BUFSIZE)) > 0) {
      write (1, buf, n);
  }
  while ((n = read (0, buf, BUFSIZE)) > 0) {
      write (1, buf, n);
  }
}

Send it a "sigusr1" to signal "ok, it's your turn to output the data",
and send it a "SIGSTOP" to make it block the pipeline. Send it an
"SIGINT" to make it abort the pipe....

> We would also probably need some kind of keep alive since it could be
> hours between the start of dump and the actual data motion.

> It would also have to allow for a dump to be aborted and restarted,
> which happens during direct to tape when an error is detected.

The restart would simply be done the "old fashioned" way, right? 

> Finally, this only applies to dump.  GNU tar does its estimates in
> a completely different way (it literally does the dump to /dev/null,
> with an efficiency shortstop that skips any real I/O, and just outputs
> the total size when it is completely done) and would not gain anything
> with these changes.

Oh yeah. That nice feature about "tar" that it only stats the files,
and then doesn't read the data. Brilliant. 

        tar cvf /dev/null /cdrom

to verify if all data-blocks on the CDROM are readable. Fooled you!

This is a feature that -=cannot=- be allowed to be on by default!
(It is, I know. )-;

> As I said, it's an interesting idea and worth some more thought, but
> will take significant changes to the way Amanda does things (which is
> not necessarily a bad thing).


-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots.
Re: estimate problem.

Reply via email to