On 1/2/07, Giorgos Keramidas <[EMAIL PROTECTED]> wrote:
On 2007-01-02 10:20, Kurt Buff <[EMAIL PROTECTED]> wrote:
You can probably use awk(1) or perl(1) to post-process the output of
gzip(1).

The gzip(1) utility, when run with the -cd options will uncompress the
compressed files and send the uncompressed data to standard output,
without actually affecting the on-disk copy of the compressed data.

It is easy then to pipe the uncompressed data to wc(1) to count the
'bytes' of the uncompressed data:

        for fname in *.Z *.z *.gz; do
                if test -f "${fname}"; then
                        gzip -cd "${fname}" | wc -c
                fi
        done

This will print the byte-size of the uncompressed output of gzip, for
all the files which are currently compressed.  Something like the
following could be its output:

I put together this one-liner after perusing 'man zcat':

find /local/amavis/virusmails -name "*.gz" -print | xargs zcat -l >> out.txt

It puts out multiple instances of stuff like this:

compressed  uncompr. ratio uncompressed_name
    1508      3470  57.0% stuff-7f+BIOFX1-qX
    1660      3576  54.0% stuff-bsFK-yGcWyCm
    9113     17065  46.7% stuff-os1MKlKGu8ky
...
...
...
10214796  17845081  42.7% (totals)
compressed  uncompr. ratio uncompressed_name
    7790     14732  47.2% stuff-Z3UO7-uvMANd
    1806      3705  51.7% stuff-9ADk-DSBFQGQ
    9020     16638  45.8% stuff-Caqfgao-Tc5F
    7508     14361  47.8% stuff-kVUWa8ua4zxc

I'm thinking that piping the output like so:

find /local/amavis/virusmails -name "*.gz" -print | xargs zcat -l |
grep -v compress | grep-v totals

will do to suppress extraneous header/footer info


This can be piped into awk(1) for further processing, with something
like this:

        for fname in *.Z *.gz; do
                if test -f "$fname"; then
                        gzip -cd "$fname" | wc -c
                fi
        done | \
        awk 'BEGIN {
            min = -1; max = 0; total = 0;
        }
        {
            total += $1;
            if ($1 > max) {
                max = $1;
            }
            if (min == -1 || $1 < min) {
                min = $1;
            }
        }
        END {
            if (NR > 0) {
                printf "min/avg/max file size = %d/%d/%d\n",
                    min, total / NR, max;
            }
        }'

With the same files as above, the output of this would be:

        min/avg/max file size = 220381/1750650/3280920

With a slightly modified awk(1) script, you can even print a running
min/average/max count, following each line.  Mmodified lines marked with
a pipe character (`|') in their leftmost column below.  The '|'
characters are *not* part of the script itself.

        for fname in *.Z *.gz; do
                if test -f "$fname"; then
                        gzip -cd "$fname" | wc -c
                fi
        done | \
        awk 'BEGIN {
            min = -1; max = 0; total = 0;
|           printf "%10s %10s %10s %10s\n",
|               "SIZE", "MIN", "AVERAGE", "MAX";
        }
        {
            total += $1;
            if ($1 > max) {
                max = $1;
            }
            if (min == -1 || $1 < min) {
                min = $1;
            }
|           printf "%10d %10d %10d %10d\n",
|               $1, min, total/NR, max;
        }
        END {
            if (NR > 0) {
|               printf "%10s %10d %10d %10d\n",
|                   "TOTAL", min, total / NR, max;
            }
        }'

When run with the same set of two compressed files this will print:

      SIZE        MIN    AVERAGE        MAX
    220381     220381     220381     220381
   3280920     220381    1750650    3280920
     TOTAL     220381    1750650    3280920

Please note though that with a sufficiently large set of files, awk(1)
may fail to count the total number of bytes correctly.  If this is the
case, it should be easy to write an equivalent Perl or Python script,
to take advantage of their big-number support.

I'll try to parse and understand this, and see if I can modify it to
suit the output I'm currently generating.

Many thanks for the help!

Kurt
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to