one small comment inserted below On Feb 27, 2014, at 11:33 PM, Jon LaBadie <[email protected]> wrote:
> Oliver already provided good answers, I'll just add a bit. > > On Fri, Feb 28, 2014 at 10:35:08AM +0700, Olivier Nicole wrote: >> Muchael, >> > ... >> >>> 3) I had figured that when restoring, amrestore has to read in a complete >>> dump/tar file before it can extract even a single file. So if I have a >>> single DLE that's ~2TB that fits (with multiple parts) on a single tape, >>> then to restore a single file, amrestore has to read the whole tape. >>> HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, >>> and the file has been restored, but the amrecover operation is still >>> running, for quite some time after restoring the file. Why might this be >>> happening? >> >> Your touching the essence or tapes here: they are sequential access. >> >> So in order to access one specifi DLE on the tape, the tape has to >> position at the very begining of the tape and read everything until it >> reaches that dle (the nth file on the tape). >> > > Most (all?) current tape formats and drives can fast forward looking > for end of file marks. Amanda knows the position of the file on the > tape and will have to drive go at high speed to that tape file. > > For formats like LTO, which have many tracks on the tape, I think it > is even faster. I "think" a TOC records where (i.e. which track) each > file starts. So it doesn't have to fast forward and back 50 times to > get to the "tenth" file which is on the 51st track. > >> Then it has to read sequentially all that file containing the backup of >> a dle to find the file(s) you want to restore. I am not sure about dump, >> but I am pretty sure that if your tar backup was a file on a disk >> instead of a file on a tape, it would read sequentially from the >> begining of the tar file, in a similar way. >> >> Then it has to read until the end of the tar (not sure about dump) to >> make sure that there is no other file(s) satisfying your extraction >> criteria. >> >> So yes, if the file you want to extract is at the begining of your tar, >> it will continue reading for a certain amount of time after the file has >> been extracted. > > Another reason this happens is the "append" feature of tar. It is > possible that a second, later version of the same file is in the tar > file. Amanda does not use this feature but tar does not know this. > If you see the file you want has been recovered, you can interupt > amrecover. > >>> The recover log shows this on the client doing the recovery: >>> >>> [root@cfile amRecoverTest_Feb_27]# tail -f >>> /var/log/amanda/client/jet1/amrecover.20140227135820.debug >>> Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback: >>> data is still flowing >>> >>> 3a) Where is the recovered dump file written to by amrecover? I can't see >>> space being used for it on either server or client. Is it streaming and >>> untar'ing in memory, only writing the desired files to disk? >> > The tar file is not written to disk be amrecover. The desired files are > extracted as the tarchive streams. > >> In the directory from where you started the amrecover command. With tar, >> it will create the same exact hierarchy, reflecting the original DLE. >> >> try: >> >> find . -name myfilename -print > > I strongly suggest you NOT use amrecover to extract directly to the > filesystem. Extract them in a temporary directory and once you are > sure they are what you want, copy/move them to their correct location. To make this completely clear (i.e. "restoring guide for idiots") - cd /tmp/something - amrecover ….. The files will be restored into the /tmp/something which is your current directory when you typed the amrecover command. > > ... >>> So assuming all the above is true, it'd be great if amdump could >>> automatically break large DLE's into small DLE's to end up with smaller >>> dump files and faster restore of individual files. Maybe it would happen >>> only for level 0 dumps, so that incremental dumps would still use the same >>> sub-DLE's used by the most recent level 0 dump. > > Sure, great idea. Then all you would need to configure is one DLE > starting at "/". Amanda would break things up into sub-DLEs. > > Nope, sorry amanda asks the backup-admin to do that part of the > config. That's why you get the big bucks ;) > >> >>> The issue I have is that with 30TB of data, there'd be lots of manual >>> fragmenting of data directories to get more easily-restorable DLE's sizes >>> of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB >>> each, while many others have only 100GB or so. Manually breaking these into >>> smaller DLE's once is fine, but since data gets regularly moved, added and >>> deleted, things would quickly change and upset my smaller DLE's. > > I'll bet if you try you will be able to make some logical splits. >>> >>> Any thoughts on how I can approach this? If amanda can't do it, I thought I >>> might try a script to create DLE's of a desired size based on disk-usage, >>> then run the script everytime I wanted to do a new level 0 dump. That of >>> course would mean telling amanda when I wanted to do level 0's, rather than >>> amanda controlling it. > > Using a scheme like that, when it comes to recovering data, which DLE > was the object in last summer? Remember that when you are asked to > recover some data, you will probably be under time pressure with clients > and bosses looking over your shoulder. That's not the time you want > to fumble around trying to determine which DLE the data is in. > > Jon > -- > Jon H. LaBadie [email protected] > 11226 South Shore Rd. (703) 787-0688 (H) > Reston, VA 20190 (609) 477-8330 (C)
