* Dustin J. Mitchell <[email protected]> [20100819 16:15]:
> On Thu, Aug 19, 2010 at 2:27 PM, Jean-Francois Malouin
> <[email protected]> wrote:
> > I know that a little while ago I reported this problem but I still
> > can't get my head around it: ever since I upgraded to amanda-3.1.x the
> > avg tape stats in amreport are bogus. I know for a fact that the
> > holddisk io read stats (from munin) --as this is when the taper starts
> > pushing stuff out of the holddisk to the tape-- are consistently above
> > 100MBs for the run yet amreport says ~14MBs. This lead me to suspect
> > that the starting time for the taper to start to write the first DLE
> > is wrongly set or parsed or something along this.
> 
> Aside from some estimates based on iostat, do you have evidence that
> something is incorrect? Are there times that are incorrect (perhaps
> looking at the timestamped debug logs can help determine this)?  Or
> sizes that are incorrect?  Or are the size/time calculations made
> incorrectly?

No, this is systematic, every single config (I have 7 on 4 servers!)
show the same trend, every single time. All DLEs seem to go way above
tape streaming speed but amreport says otherwize. I've looked at the
debug files and sizes are correctly reported and timestamps too.
Unless I'm missing something very obvious, I think the calculation is
done wrong or that the initial timestamp doesn't correspond to the
time when bytes are actually being written to tape.

See below for more on this.

> 
> Barring that, I think I know what you're seeing.
> 
> If you look in the rightmost column of the table you sent, you'll see
> write rates around 100MB/s.  These are calculated by taking the time
> and size from first byte to last of each part and summing.  Obviously
> the much shorter parts do not achieve this rate, probably because it
> includes time to spin the drive up at the beginning and flush buffers
> at the end.
> 
> The Avg Tape Write Rate, on the other hand, counts a lot of other
> stuff in its total.  It basically looks at the time between the
> beginning and end of the dump, which can include headers, filemarks,
> and even tape changes.

Here is a typical thing I see all the time
First, some context. All dumptypes have tape_splitsize set to 40GB
(for an LTO4) and

taperalgo largestfit
flush-threshold-dumped 100
flush-threshold-scheduled 100
taperflush 100
autoflush yes

which essentially means that for an LTO4 at 800GB amanda won't ask for
a new tape until there enough to fill it. So imagine the scenario
where there is a DLE that didn't make it to tape because the criteria
for a new tape is not met, is left on the holddisk until a new amdump
is run, at which point it will be flushed, and flushed first since,
say, it's the biggest (tapealgo). The flush won't start immediately at
the beginning of the run but only after some DLEs will have been
chunked to the holddisk, at which point the criteria for writing to a
new tape will be met.

amreport will always show that it was written at dismally slow speed
yet the taper debug logs show very fast (80 to 100MBs) writes.
*Always* *and* *only* for the first flushed DLE. Me thinks that the
initial timestamp for the start of write to tape is taken while no
bytes has made to tape yet, and that it includes part of the time it
takes for the estimates to finish and time to chunk new DLEs to
holddisk, or something like this. 

This is not a ranti btw! 
I'm just surprised that no one who have upgraded to 3.1.x have noticed
this! Don't people read their amreports :)

Dustin, thanks for the time and great work in 3.1.x!
jf


> 
> >From the perspective of the trace logs, the per-dump rates are based
> on the sec and kb entries in the PART lines; the Avg Tp Write Rate is
> based on the sec and kb entries in the DONE and PARTIAL lines.
> 
> This has been the historical precedent for the content of these log
> lines.  When we discovered this while rewriting the taper and
> reporting code, the immediate temptation was to "fix" it, but then
> we'd have a lot of people surprised that the times changed so much in
> 3.1.1, and I'd be explaining why we changed it.. you just can't win!
> 
> Dustin
> 
> P.S. I'll fix the cosmetic bug that amreport claims the tape write is
> in k/s when it's really in $displayunit/s.
> 
> -- 
> Open Source Storage Engineer
> http://www.zmanda.com

Reply via email to