On 1/3/07, Ian Smith <[EMAIL PROTECTED]> wrote:
> Message: 17
> Date: Tue, 2 Jan 2007 19:50:01 -0800
> From: James Long <[EMAIL PROTECTED]>
> > Message: 28
> > Date: Tue, 2 Jan 2007 10:20:08 -0800
> > From: "Kurt Buff" <[EMAIL PROTECTED]>
> > I don't even have a clue how to start this one, so am looking for a little
help.
> >
> > I've got a directory with a large number of gzipped files in it (over
> > 110k) along with a few thousand uncompressed files.
If it were me I'd mv those into a bunch of subdirectories; things get
really slow with more than 500 or so files per directory .. anyway ..
I just store them for a while - delete them after two weeks if they're
not needed again. The overhead isn't enough to worry about at this
point.
> > I'd like to find the average uncompressed size of the gzipped files,
> > and ignore the uncompressed files.
> >
> > How on earth would I go about doing that with the default shell (no
> > bash or other shells installed), or in perl, or something like that.
> > I'm no scripter of any great expertise, and am just stumbling over
> > this trying to find an approach.
> >
> > Many thanks for any help,
> >
> > Kurt
>
> Hi, Kurt.
And hi, James,
> Can I make some assumptions that simplify things? No kinky filenames,
> just [a-zA-Z0-9.]. My approach specifically doesn't like colons or
> spaces, I bet. Also, you say gzipped, so I'm assuming it's ONLY gzip,
> no bzip2, etc.
>
> Here's a first draft that might give you some ideas. It will output:
>
> foo.gz : 3456
> bar.gz : 1048576
> (etc.)
>
> find . -type f | while read fname; do
> file $fname | grep -q "compressed" && echo "$fname : $(zcat $fname | wc
-c)"
> done
% file cat7/tuning.7.gz
cat7/tuning.7.gz: gzip compressed data, from Unix
Good check, though grep "gzip compressed" excludes bzip2 etc.
But you REALLY don't want to zcat 110 thousand files just to wc 'em,
unless it's a benchmark :) .. may I suggest a slight speedup, template:
% gunzip -l cat7/tuning.7.gz
compressed uncompr. ratio uncompressed_name
13642 38421 64.5% cat7/tuning.7
> If you really need a script that will do the math for you, then
> pip the output of this into bc:
>
> #!/bin/sh
>
> find . -type f | {
>
> n=0
> echo scale=2
> echo -n "("
> while read fname; do
- > if file $fname | grep -q "compressed"
+ if file $fname | grep -q "gzip compressed"
> then
- > echo -n "$(zcat $fname | wc -c)+"
+ echo -n "$(gunzip -l $fname | grep -v comp | awk '{print $2}')+"
> n=$(($n+1))
> fi
> done
> echo "0) / $n"
>
> }
>
> That should give you the average decompressed size of the gzip'ped
> files in the current directory.
HTH, Ian
Ah - yes, I think that's much better. I should have thought of awk.
At some point, I'd like to do a bit more processing of file sizes,
such as trying to find out the number of IP packets each file would
take during an SMTP transaction, so that I could categorize overhead a
bit, but for now the average uncompressed file size is good enough.
Thanks again for your help!
Kurt
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"