one small comment inserted below

On Feb 27, 2014, at 11:33 PM, Jon LaBadie <[email protected]>
 wrote:

> Oliver already provided good answers, I'll just add a bit.
> 
> On Fri, Feb 28, 2014 at 10:35:08AM +0700, Olivier Nicole wrote:
>> Muchael,
>> 
> ...
>> 
>>> 3) I had figured that when restoring, amrestore has to read in a complete
>>> dump/tar file before it can extract even a single file. So if I have a
>>> single DLE that's ~2TB that fits (with multiple parts) on a single tape,
>>> then to restore a single file, amrestore has to read the whole tape.
>>> HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE,
>>> and the file has been restored, but the amrecover operation is still
>>> running, for quite some time after restoring the file. Why might this be
>>> happening?
>> 
>> Your touching the essence or tapes here: they are sequential access.
>> 
>> So in order to access one specifi DLE on the tape, the tape has to
>> position at the very begining of the tape and read everything until it
>> reaches that dle (the nth file on the tape).
>> 
> 
> Most (all?) current tape formats and drives can fast forward looking
> for end of file marks.  Amanda knows the position of the file on the
> tape and will have to drive go at high speed to that tape file.
> 
> For formats like LTO, which have many tracks on the tape, I think it
> is even faster.  I "think" a TOC records where (i.e. which track) each
> file starts.  So it doesn't have to fast forward and back 50 times to
> get to the "tenth" file which is on the 51st track.
> 
>> Then it has to read sequentially all that file containing the backup of
>> a dle to find the file(s) you want to restore. I am not sure about dump,
>> but I am pretty sure that if your tar backup was a file on a disk
>> instead of a file on a tape, it would read sequentially from the
>> begining of the tar file, in a similar way.
>> 
>> Then it has to read until the end of the tar (not sure about dump) to
>> make sure that there is no other file(s) satisfying your extraction
>> criteria.
>> 
>> So yes, if the file you want to extract is at the begining of your tar,
>> it will continue reading for a certain amount of time after the file has
>> been extracted.
> 
> Another reason this happens is the "append" feature of tar.  It is
> possible that a second, later version of the same file is in the tar
> file.  Amanda does not use this feature but tar does not know this.
> If you see the file you want has been recovered, you can interupt
> amrecover.
> 
>>> The recover log shows this on the client doing the recovery:
>>> 
>>> [root@cfile amRecoverTest_Feb_27]# tail -f
>>> /var/log/amanda/client/jet1/amrecover.20140227135820.debug
>>> Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback:
>>> data is still flowing
>>> 
>>> 3a) Where is the recovered dump file written to by amrecover? I can't see
>>> space being used for it on either server or client. Is it streaming and
>>> untar'ing in memory, only writing the desired files to disk?
>> 
> The tar file is not written to disk be amrecover.  The desired files are
> extracted as the tarchive streams.
> 
>> In the directory from where you started the amrecover command. With tar,
>> it will create the same exact hierarchy, reflecting the original DLE.
>> 
>> try:
>> 
>> find . -name myfilename -print
> 
> I strongly suggest you NOT use amrecover to extract directly to the
> filesystem.  Extract them in a temporary directory and once you are
> sure they are what you want, copy/move them to their correct location.

To make this completely clear  (i.e. "restoring guide for idiots")
-  cd  /tmp/something
-  amrecover  …..

The files will be restored into the /tmp/something  which is your current 
directory
when you typed the amrecover command.


> 
> ...
>>> So assuming all the above is true, it'd be great if amdump could
>>> automatically break large DLE's into small DLE's to end up with smaller
>>> dump files and faster restore of individual files. Maybe it would happen
>>> only for level 0 dumps, so that incremental dumps would still use the same
>>> sub-DLE's used by the most recent level 0 dump.
> 
> Sure, great idea.  Then all you would need to configure is one DLE
> starting at "/".  Amanda would break things up into sub-DLEs.
> 
> Nope, sorry amanda asks the backup-admin to do that part of the
> config.  That's why you get the big bucks ;)
> 
>> 
>>> The issue I have is that with 30TB of data, there'd be lots of manual
>>> fragmenting of data directories to get more easily-restorable DLE's sizes
>>> of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB
>>> each, while many others have only 100GB or so. Manually breaking these into
>>> smaller DLE's once is fine, but since data gets regularly moved, added and
>>> deleted, things would quickly change and upset my smaller DLE's.
> 
> I'll bet if you try you will be able to make some logical splits.
>>> 
>>> Any thoughts on how I can approach this? If amanda can't do it, I thought I
>>> might try a script to create DLE's of a desired size based on disk-usage,
>>> then run the script everytime I wanted to do a new level 0 dump. That of
>>> course would mean telling amanda when I wanted to do level 0's, rather than
>>> amanda controlling it.
> 
> Using a scheme like that, when it comes to recovering data, which DLE
> was the object in last summer?  Remember that when you are asked to
> recover some data, you will probably be under time pressure with clients
> and bosses looking over your shoulder.  That's not the time you want
> to fumble around trying to determine which DLE the data is in.
> 
> Jon
> -- 
> Jon H. LaBadie                 [email protected]
> 11226 South Shore Rd.          (703) 787-0688 (H)
> Reston, VA  20190              (609) 477-8330 (C)


Reply via email to