Oliver already provided good answers, I'll just add a bit. On Fri, Feb 28, 2014 at 10:35:08AM +0700, Olivier Nicole wrote: > Muchael, > ... > > > 3) I had figured that when restoring, amrestore has to read in a complete > > dump/tar file before it can extract even a single file. So if I have a > > single DLE that's ~2TB that fits (with multiple parts) on a single tape, > > then to restore a single file, amrestore has to read the whole tape. > > HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, > > and the file has been restored, but the amrecover operation is still > > running, for quite some time after restoring the file. Why might this be > > happening? > > Your touching the essence or tapes here: they are sequential access. > > So in order to access one specifi DLE on the tape, the tape has to > position at the very begining of the tape and read everything until it > reaches that dle (the nth file on the tape). >
Most (all?) current tape formats and drives can fast forward looking for end of file marks. Amanda knows the position of the file on the tape and will have to drive go at high speed to that tape file. For formats like LTO, which have many tracks on the tape, I think it is even faster. I "think" a TOC records where (i.e. which track) each file starts. So it doesn't have to fast forward and back 50 times to get to the "tenth" file which is on the 51st track. > Then it has to read sequentially all that file containing the backup of > a dle to find the file(s) you want to restore. I am not sure about dump, > but I am pretty sure that if your tar backup was a file on a disk > instead of a file on a tape, it would read sequentially from the > begining of the tar file, in a similar way. > > Then it has to read until the end of the tar (not sure about dump) to > make sure that there is no other file(s) satisfying your extraction > criteria. > > So yes, if the file you want to extract is at the begining of your tar, > it will continue reading for a certain amount of time after the file has > been extracted. Another reason this happens is the "append" feature of tar. It is possible that a second, later version of the same file is in the tar file. Amanda does not use this feature but tar does not know this. If you see the file you want has been recovered, you can interupt amrecover. > > The recover log shows this on the client doing the recovery: > > > > [root@cfile amRecoverTest_Feb_27]# tail -f > > /var/log/amanda/client/jet1/amrecover.20140227135820.debug > > Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback: > > data is still flowing > > > > 3a) Where is the recovered dump file written to by amrecover? I can't see > > space being used for it on either server or client. Is it streaming and > > untar'ing in memory, only writing the desired files to disk? > The tar file is not written to disk be amrecover. The desired files are extracted as the tarchive streams. > In the directory from where you started the amrecover command. With tar, > it will create the same exact hierarchy, reflecting the original DLE. > > try: > > find . -name myfilename -print I strongly suggest you NOT use amrecover to extract directly to the filesystem. Extract them in a temporary directory and once you are sure they are what you want, copy/move them to their correct location. ... > > So assuming all the above is true, it'd be great if amdump could > > automatically break large DLE's into small DLE's to end up with smaller > > dump files and faster restore of individual files. Maybe it would happen > > only for level 0 dumps, so that incremental dumps would still use the same > > sub-DLE's used by the most recent level 0 dump. Sure, great idea. Then all you would need to configure is one DLE starting at "/". Amanda would break things up into sub-DLEs. Nope, sorry amanda asks the backup-admin to do that part of the config. That's why you get the big bucks ;) > > > The issue I have is that with 30TB of data, there'd be lots of manual > > fragmenting of data directories to get more easily-restorable DLE's sizes > > of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB > > each, while many others have only 100GB or so. Manually breaking these into > > smaller DLE's once is fine, but since data gets regularly moved, added and > > deleted, things would quickly change and upset my smaller DLE's. I'll bet if you try you will be able to make some logical splits. > > > > Any thoughts on how I can approach this? If amanda can't do it, I thought I > > might try a script to create DLE's of a desired size based on disk-usage, > > then run the script everytime I wanted to do a new level 0 dump. That of > > course would mean telling amanda when I wanted to do level 0's, rather than > > amanda controlling it. Using a scheme like that, when it comes to recovering data, which DLE was the object in last summer? Remember that when you are asked to recover some data, you will probably be under time pressure with clients and bosses looking over your shoulder. That's not the time you want to fumble around trying to determine which DLE the data is in. Jon -- Jon H. LaBadie [email protected] 11226 South Shore Rd. (703) 787-0688 (H) Reston, VA 20190 (609) 477-8330 (C)
