On Wednesday 02 June 2004 18:21, Glenn English wrote: >On Wed, 2004-06-02 at 14:12, Gene Heskett wrote: >> Any current ide drive can do 30+ Mb/sec if left >> alone by other tasks, often quite a ways on the + side. > >Is that just a burst out of the cache, or can they read > dis-contiguous files, seek around to other files, wait for latency, > and write all at the same time that fast? Or even half that fast? > If so, and if Linux and Intel's IDE controllers lose another 25% > moving bits around, it'd still be comfortably faster than the tape > drive. I think I may have something horribly misconfigured. > Well, in fairness, thats the hdparm -tT rateings I'm quoting, which is generally a 1 or 2 second burst, either from the cache, or from the surface itself. This does NOT take into consideration seek times and rotational latency, and probably shouldn't actually be a concern within a single file transfer from disk to tape. And by 'file' I mean that whole, completed backup of the individual disklist entry, or as we call them, DLE's.
I'm inclined to ramble a bit, so bear with me folks. I think the point here is that in doing a pure read, with no write interleaves in it, from an individual disk (and controller too), to an individual tape drive on its own, probably scsi controller, should be fast enough to stream even the most currant tape drive on the market. None of these to my knowledge contain any black magic such as is used in modern digital video recorders. The really fast data rates common in video formats such as the panasonic dvc-pro, originally a 25Mb/sec format, and then 50Mb/sec, and for hdtv is now at 100Mb/sec, have not made it into the data storage business, and probably never will. This is primarily because all of these formats aren't "verbatum" formats, but formats that do error correction based on hideing the error from the human eye, and they are doing it to an already mpeg2'd (or better) video stream. And much of that is based on data shuffling and hashing wherein the burst of bad data that would cause you to ditch a data tape, goes right on by because that one, single, maybe 20 byte wide dropout on the tape, is shuffled around until its a one bit error in many pixels worth of data scattered out over the whole frame of video. With data replacement techniques based on what the adjacent data is, you never see it until the error rate is more than 50 bytes per kilobyte. Back to here, and now I'm trying to sound like an expert, but I'm neither carrying a briefcase, nor am I more than 50 miles from home, one wags definition of an expert. :-) The ideal situation would be to have the backup thats being optionally gzipped (bring cpu horsepower, all you can get) and stored in holding disk two, would not be on the same disk, controller and cable as holding disk one, so that one could be doing a read and transfer to the tape, while two is receiving the backup from tar|gzip whatever. One of the tools amnada uses to prevent disk access contentions is the spindle number given optionally in the DLE. Each physical disk should have its own, unique spindle number. This same number is used for all the DLE's that are on that disk. The next disk gets a different number, etc etc. Now, I know that you can give amanda more than one holding disk specification, but what I don't know is how amanda determines which holding disk to use for each DLE. If someone more familiar with the code than I could bail me out here, it might become more obvious to this user what he must do to best alleviate his problem. Currently I see it as needing a pair of individual disks on their own controller for use as holding disks, but I cannot advise how to make amanda do the correct ping-ponging to help end the shoeshining of his tape drive. Of course such a scheme will probably be a bad puppy and make a mess on the rug when the DLE's are widely different in sizes (and compression useage) One thing that hasn't been mentioned because its overshadowed by the larger picture, is that if the drive is using its internal compressor, then amanda has only a SWAG's (maybe + - 30% or more) idea of the tapes true capacity. Amanda counts bytes fed down the cable to the drive, after any gzipping has been done if its used. Then amanda can know to well within a percent or so of how much data she can stuff onto that tape, making maximum use of the available resources. This also exlains why we generally recommend that the drives compressor be turned off forever. The nice thing about the way amanda does its compression is that each client can be told to do its own compression, thereby offloading that time consuming chore from the server. Since each client can do its own compression, adding clients doesn't slow you down since they can all run in parallel with minimal or no interaction other than maybe cat5 collisions. But those are recovered so quickly in most cases that with 100baseT circuits and normal drives, its no big deal. Just bare in mind that data fed straight to that drive off the network because of something fubar in the holding disk setup, will really make the drive shuffle tape. I think I finally ran down... Maybe someplace a light came on? Funny, I can remember when we had exactly this same shoeshineing problem with 120 meg QIC drives running on 25 mhz 386sx boxes with 7Mb/sec isa busses. Then the only cure really was a faster box. Please don't call me a dynosaur though, even if my temper resembles a T-Rexx's occasionally. :) >> If you are not using spindle numbers in your disklist, maybe it >> would help to prevent thrashing of seeks all over the place >> because more than one dumper is attacking the drive >> simultainiously. > >I am. It helped a lot. > >> This might mean that the tape would stop and do a bit of >> shoeshining in between files, but a given file should be able to >> be 'poured down the pipe' non-stop. > >That'd be one 'buzz-squinch-buzz' per dump file. That's a > possibility. I'll look into it. Also an argument against thousands > of partitions. > >> There is also an algorythm string in amanda.conf that adjusts the >> dumporders a bit, I have mine set to to the largest dump first, so >> that once its done, there is a good chance the rest of the thing >> is already in the holding disk and I get the drives maximum speed >> once it actually starts. > >That I didn't know about at all. I'll go find it. > >> In this case, it seems he needs two disks assigned as holding >> disks, with the hope that amanda would write to one, then the >> other, alternating such that the one being written was not being >> read by a taper at the same time. > >Now that's silly :-) Amanda's creating big, contiguous files > designed to stream a tape drive. Disk drives are supposed to be > vastly faster than the tape. From what you said earlier, that's > where I think I need to focus attention. > >There and maybe just a little on reducing SCSI snobbery :-) Very >informative. Thanks. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.23% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
