* Dustin J. Mitchell <[email protected]> [20100819 16:15]: > On Thu, Aug 19, 2010 at 2:27 PM, Jean-Francois Malouin > <[email protected]> wrote: > > I know that a little while ago I reported this problem but I still > > can't get my head around it: ever since I upgraded to amanda-3.1.x the > > avg tape stats in amreport are bogus. I know for a fact that the > > holddisk io read stats (from munin) --as this is when the taper starts > > pushing stuff out of the holddisk to the tape-- are consistently above > > 100MBs for the run yet amreport says ~14MBs. This lead me to suspect > > that the starting time for the taper to start to write the first DLE > > is wrongly set or parsed or something along this. > > Aside from some estimates based on iostat, do you have evidence that > something is incorrect? Are there times that are incorrect (perhaps > looking at the timestamped debug logs can help determine this)? Or > sizes that are incorrect? Or are the size/time calculations made > incorrectly?
No, this is systematic, every single config (I have 7 on 4 servers!) show the same trend, every single time. All DLEs seem to go way above tape streaming speed but amreport says otherwize. I've looked at the debug files and sizes are correctly reported and timestamps too. Unless I'm missing something very obvious, I think the calculation is done wrong or that the initial timestamp doesn't correspond to the time when bytes are actually being written to tape. See below for more on this. > > Barring that, I think I know what you're seeing. > > If you look in the rightmost column of the table you sent, you'll see > write rates around 100MB/s. These are calculated by taking the time > and size from first byte to last of each part and summing. Obviously > the much shorter parts do not achieve this rate, probably because it > includes time to spin the drive up at the beginning and flush buffers > at the end. > > The Avg Tape Write Rate, on the other hand, counts a lot of other > stuff in its total. It basically looks at the time between the > beginning and end of the dump, which can include headers, filemarks, > and even tape changes. Here is a typical thing I see all the time First, some context. All dumptypes have tape_splitsize set to 40GB (for an LTO4) and taperalgo largestfit flush-threshold-dumped 100 flush-threshold-scheduled 100 taperflush 100 autoflush yes which essentially means that for an LTO4 at 800GB amanda won't ask for a new tape until there enough to fill it. So imagine the scenario where there is a DLE that didn't make it to tape because the criteria for a new tape is not met, is left on the holddisk until a new amdump is run, at which point it will be flushed, and flushed first since, say, it's the biggest (tapealgo). The flush won't start immediately at the beginning of the run but only after some DLEs will have been chunked to the holddisk, at which point the criteria for writing to a new tape will be met. amreport will always show that it was written at dismally slow speed yet the taper debug logs show very fast (80 to 100MBs) writes. *Always* *and* *only* for the first flushed DLE. Me thinks that the initial timestamp for the start of write to tape is taken while no bytes has made to tape yet, and that it includes part of the time it takes for the estimates to finish and time to chunk new DLEs to holddisk, or something like this. This is not a ranti btw! I'm just surprised that no one who have upgraded to 3.1.x have noticed this! Don't people read their amreports :) Dustin, thanks for the time and great work in 3.1.x! jf > > >From the perspective of the trace logs, the per-dump rates are based > on the sec and kb entries in the PART lines; the Avg Tp Write Rate is > based on the sec and kb entries in the DONE and PARTIAL lines. > > This has been the historical precedent for the content of these log > lines. When we discovered this while rewriting the taper and > reporting code, the immediate temptation was to "fix" it, but then > we'd have a lot of people surprised that the times changed so much in > 3.1.1, and I'd be explaining why we changed it.. you just can't win! > > Dustin > > P.S. I'll fix the cosmetic bug that amreport claims the tape write is > in k/s when it's really in $displayunit/s. > > -- > Open Source Storage Engineer > http://www.zmanda.com
