BTW, this is just random ramblings from me, I don't claim to be any
sort of expert (esp. compression programming), more like a "power
user" (if even that).

On Thu, Aug 29, 2013 at 4:23 PM, Eric Auer <e.a...@jpberlin.de> wrote:
> [Tom]
>>> the very idea of 7zip is to tar first (internally), then compress.
> [Bojan]
>> Very idea of 7zip is a specific compression algorithm, not a way
>> the compressing utilites work. :)
> Actually you are BOTH right. As Rugxulo already mentioned, there is
> a difference between archives where each file inside is compressed
> separately and "compact" archives. The latter put all files in one
> big block of data and compress that. The default mode of 7ZIP is
> to make a COMPACT archive with a SPECIFIC compression algorithm.

IIRC, the default mode of 7za is -mx5 and .7z format using LZMA [or
LZMA2 in newer alphas] method, which means (among other things) 16 MB
dictionary, aka 2^24, aka LZMA:24 (but you can adjust that, e.g.
"-ms=32m -m0d=32m" [LZMA:25] seems to compress slightly better, if the
file is bigger than 16 MB).

Bzip2 is BWT method with blocksize 100k up to 900k only. Gzip is
Deflate with 32 kb dictionary. Zip has had various methods, but the
default has been Deflate for a very long time.

Just to clarify, .7z format can have Bzip2 or Deflate methods (or Ppmd
or others). Even .ZIP format can officially support Bzip2 or even LZMA
method (or others).

> ZIP is non-compact and otherwise comparable to TAR.GZ in strength
> of algorithm.

Yes, hence .ZIP is slightly worse compression overall but provides
some file separation which can (in limited use) make it easier to
recover some files from a corrupted archive. Though normally (some of
the fancier) archivers add some minimal redundancy data instead
(though even that is limited and not really a good replacement for
full backups).

.gz seems mostly to be meant for streaming as it doesn't really
support anything beyond a very minimal header. Though of course I
think? you can concat several .gz files, and it will still decompress
them all correctly, but that's rare (in my limited experience).

Of course, only .tar saves *nix permissions info, as .ZIP was less
friendly (by default) since it was DOS-oriented. Yes, you can kludge
it with your own workarounds (and who knows what can optionally be
saved in "extra fields", better check appnote.txt), but due to that, I
think, most people on *nix still don't use .ZIP very much.

.gz (LZ77?) was actually just meant to be a patent-free replacement
for "compress" .Z, which used (IIRC,  now unpatented) LZW.

> So if you compress many similar files, TAR.GZ gives you the smaller archive.

"Usually" but not always. 7-Zip provides its own "improved" Deflate,
which is slightly better (tries harder, gives up less easily) than the
algorithm typically used in such encoders. I'm not sure of the details
(no EOS markers?). Long story short: .ZIP has bigger internal headers
but it's very minor difference (overall), so it's still technically
possible to use (for instance) 7-Zip to create a .ZIP that is actually
smaller than a (normal) "GNU gzip"-produced .gz file (for the same

> More modern algorithms like BZIP2 will
> often compress data better, but will spend much more RAM and CPU
> time in doing that. So TAR.BZ2 is smaller than TAR.GZ which uses

"Usually" smaller. And yes, Bzip2 wasn't really ever ported (AFAIK) to
anything less than 32-bit machines, not the least reason of which is
the 900 kb max blocksize (versus 32 kb) and of course its slower speed
overall. At risk of stating the obvious, there's always a tradeoff
between compressed output size and (de)compression speed and RAM

At risk of sounding snobbish (unintentionally), I think Gzip (Deflate)
is very weak. For very large files, it's inefficient, and thus I
wouldn't recommend it directly these days. Though a 35 MB file vs. 50
MB is (at least to most "modern" people) not a difference worth
worrying about (sadly). These days you can't do anything without a
fast network connection and tons of RAM and tons of disk space.

> You do not usually have to make a TAR and GZIP or BZIP2 it
> separately with a pipeline: Both functions are usually combined
> behind one command, in particular in DOS where pipelines are not
> efficient to use. In Linux or Windows, it could happen that the
> modules internally communicate via pipes without you noticing:

Dunno, honestly! It's complicated. But GNU tar does allegedly support
extracting only certain files. Of course, I'm not a big *nix (nor tar)
user, so I never use that feature. Well, DJGPP's "djtar -x -o
blah/readme.txt -p blah.tgz" is sometimes useful (decompresses and
unarchives all at once).   :-)

> In both scenarios, you do not need to have the big, uncompressed
> "throw all files in one TAR" file lying around on your harddisk
> while processing a TAR.GZ (TGZ) or TAR.BZ2 (TBZ) file, luckily!

We're way beyond the point of "most" people caring. How big is latest
Linux kernel sources (with drivers)? Usually I consider that "pretty
darn big"!


linux-3.10.10.tar.gz    29-Aug-2013 17:58  105M
linux-3.10.10.tar.bz2   29-Aug-2013 17:58   83M
linux-3.10.10.tar.xz    29-Aug-2013 17:58   70M

> However, compact archives also have disadvantages: You cannot
> remove files from them without recompressing the whole thing.
> Adding files might also work less well than for "uncompact"
> formats.

Well, (sequential only) "tape" archiving is inherently harder to use
than a seekable format, right? But it's fairly well established. What
else is there "widely available" in *nix: cpio, pax, ?? (I'm not
claiming to really know well all of this.)

> Each of multiple files archived in a ZIP exists in
> a separate area of the ZIP, so it is easy to add or remove a
> file from a ZIP

It's easy but not totally super duper simple to add or remove because
you still have to disassemble and reassemble the archive if you insert
or delete anything, not to mention updating the (overall) central
directory structure (CDS) at the end. And potential .ZIP comments make
that even slightly harder.

> or unpack a single file without having to go
> through the whole ZIP and unpack all data to find it.

The main thing to remember here is that Gzip (.gz) and .ZIP (Deflate)
use the same compression, so it's just extra fluff that is the

> As Rugxulo mentions, 7ZIP also supports "less compact" ways of
> archiving. That could mean storing information about contents
> in a more accessible way and compressing the big, compact blob
> of data in not-so-big chunks. This could allow you to unpack
> only the 10 MB of your 100 MB BIGSTUFF.7Z file where you have
> that 5 MB COOLDOC.TXT that you want to extract, thanks to some
> sort of table of contents in the file and thanks to having an
> uncompression start point every few MB. Note: I simplify here!

Yes, sometimes it's not possible to decompress several GBs just to
access one file!  ;-)

> PS: Note that the 7ZIP tool HAS an option "delete file from
> archive" but you will see that this is ONLY happy when used
> for ZIP files. When used on 7Z or TGZ, it will work worse.

I hardly ever use that feature, but IIRC that "may" have improved in
newer versions, dunno. Hmmm, seems to work fine for .7z format in 9.20
(under Windows), but that may work slightly worse (or not at all) in a
DOS build without LFNs enabled due to the way temporary files are
currently handled. (IIRC I had to manually use a dopey workaround as
end user, and honestly I wasn't really enthused to dig deeper for a
proper solution.)

Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
Freedos-user mailing list

Reply via email to