Bug#286634: marked as done (tar: --no-wildcards -X slowness)

Debian Bug Tracking System Sun, 03 Apr 2016 07:28:04 -0700

Your message dated Sun, 3 Apr 2016 14:22:12 +0000
with message-id <[email protected]>
and subject line Re: large exclusions unacceptably slow
has caused the Debian Bug report #221482,
regarding tar: --no-wildcards -X slowness
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
221482: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=221482
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Package: tar
Version: 1.14-2
Severity: normal

As part of a backup strategy, I make use of tar --no-wildcards -X
along with a file with contents of files I don't want backed up this
time.

I read a long time ago (and subsequently can't find) that you can
speed up the processing of the exclusion file very much by supplying
--no-wildcards. This is natural, because without using wildcards, you
can just keep a hash of all files to exclude, and see whether each
file in turn exists in the hash. I'm pretty sure I read that tar
implemented the exclude file as a hash in this case, am I right?

Yet, there is something wrong. With a 11MB exclude file containing
171000 files to exclude, tar's memory increases to 13MB or so. So far
so good -- it's slurped the exclude file into a hash, right?

But if I strace the tar process, I can see it takes about 0.5 seconds
to search for the existance of *each file* in the exclude file. This
takes... some time... when there are half a million files to backup.

Sure, my box is slow, but this sounds wrong - hashes are meant to be
very fast, aren't they?. So I wrote a simple perl program to read the
same exclude file into memory, and index some random files. perl took
up about 20mb of memory, or so, and it found my three random files
instantaneously after the file was slurped into that hash (it only
needs to be slurped in once, and even then, it wasn't overly slow).

So, does tar not keep a hash, am I not supplying the correct argument
to tar, or is the hash implementation broken?



-- System Information:
Debian Release: 3.1
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.4.26
Locale: LANG=en_AU, LC_CTYPE=en_AU (charmap=ISO-8859-1)

Versions of packages tar depends on:
ii  libc6                       2.3.2.ds1-18 GNU C Library: Shared libraries an

-- no debconf information

--- End Message ---

--- Begin Message ---

On Tue, 18 Nov 2003 11:59:58 -0500 Brian Ristuccia
<[email protected]> wrote:
> Package: tar
> Version: 1.13.25-2
> 
> Tar will consume inordinate amounts of cpu time if you use the
> --exclude-from=file with a large file or many --exclude options. The time
> spent winds up being O(potential_archive_members * total_exclusions), which
> for any meaningful number of exclusions is a very, very big number. For
> example, with an 8MB exclude list containing aprox 90,000 items, tar runs at
> a rate of about 1 file per second on a 1400mhz pIII with 512KB CPU cache!
> Even if my tape drive and disks were infinitely fast, it would still take
> aproximately 43 hours for tar to back up my system with 158205 files.
> 
> The attached patch causes tar to use a hashed data structure to store the
> exclusions, resulting in acceptable performance. See
> http://www.proudman51.freeserve.co.uk/tar.html
> 
> -- 
> Brian Ristuccia

This bug was fixed but apparently not marked as "-done" in the 1.23
upload.  Fixing that now with this mail. :)

Thanks,
~Niels

--- End Message ---

Bug#286634: marked as done (tar: --no-wildcards -X slowness)

Reply via email to