Re: all estimate timed out

Nathan Stratton Treadway Fri, 12 Apr 2013 12:43:01 -0700

On Fri, Apr 12, 2013 at 12:59:39 -0400, Chris Hoogendyk wrote:
> As a followup, in case anyone cares to discuss technicalities and
> examples, has anyone run into this before? It seems any site doing
> lots of sizable scanned images, or GIS systems with tiff maps, would
> have run into it. I don't know how often sparse file treatment is an
> important thing. Database files can be sparse, but proper procedure
> is to use the database tools (e.g. mysqldump) for backups and not to
> just backup the data directory. It's not clear to me exactly what
> gnutar is doing with sparse or why it is so inefficient (timewise).
> I don't think these tif files are sparse. They are just large. And
> gnutar is not just doubling the time as described in
> http://www.gnu.org/software/tar/manual/html_node/sparse.html. I was
> experiencing on the order of 400 times as much time for the sparse
> option compared to when I removed the sparse option.


I have been meaning to reply to your earlier messages but haven't had a
chance to finish the background research I wanted to do first;
meanwhile, a few quick comments and questions:

* When you did your manual test runs with and without --sparse, did the
  estimated sizes shown at the end of the run change any?

* I'll have to go back and see if things were any different with GNU tar
  v1.23, but when I was looking at the latest version's source code it,
  it was clear that at least the intention was that using --sparse would
  only change behavior when the input files are sparse -- so I am
  curious to know for sure if your tiff files are actually sparse.

  The check that tar uses to decide this is to see if the inode's block
  count times the block size for the filesystem is less than the inode's
  listed file size.   (That is, does the file have less space allocated
  than its listed size?)

  Here are a few ways I've used in the past to search for sparse files:

  - if you have GNU "ls" installed on this system:
      ls -sl --block-size=1 
    and then check to see if the number in the first colum is smaller
    than the number in the "file size" column.

  - if you have GNU "stat" installed you can run
      stat -c "%n: alloc: %b * %B  size: %s"
    , and then check to see if the %b times %B value is less than the %s 
    value.

  - using the standard Sun "ls", you can do
      ls -sl 
    , and then multiply the value in the first column by 512.  (I assume
    the "block size" used is a constant 512 in that case, regardless of
    file system.)


* The doubling of the time mentioned in the man page is in the context
  of making an actual dump, but the slowdown is much worse for the
  estimate phase.  That's because normally when tar notices that the
  output file is /dev/null, it realizes that you don't actually want the
  data from the input files, and thus doesn't actually read through
  their contents, but simply looks at the file size (from the inode
  information) and adds that to the dump-size tally before moving on to
  the next file.  So the time spend during the estimate is almost
  entirely due to reading through the directory tree, and won't depend
  on the size of the files in question.

  In the case of a file that's actually sparse, though, if the --sparse
  option is enabled then tar has to actually read in the entire file to
  see how much of it is zero blocks.  So, if many of your files are
  indeed actually sparse, then what will happen is that the estimate
  time will be about the same as the actual dump time, rather than the
  usual much-shorter estimate time.

                                                Nathan





----------------------------------------------------------------------------
Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: all estimate timed out

Reply via email to