Fwd: checking gzip files

2005-12-30 Thread S Khadar
Hi all,

I have a 15 thousand directories - each of them contain say 10 files (all
*.gzip)
out of this 10 *.gz files - I want to check whether a file named foo.gz
contain any content or not - since my files are gzipped even the blank file
occupies some size.

have a look at my code
-
#!/usr/bin/perl
use Shell;

$dir = shift;
$dir =/home/trial;
opendir(M,$dir);
@a = readdir(M);
close M;

open(KL,Dumo-chk);
print KLList of foo.gz with no contents  \n;

foreach (@a)
{
print\n\t\tProcessing $_\n;
print\tUncompressing and Readingfoo.gz\n;
next if ($_ =~ /^\./);
# my code works fine untill here ---
 here i have the problem
$dmchk=zless(  $dir/$_/foo.gz);

if (-z $dmchk)
   {
   print  KL \n$_  No - content\n;
   }
else
   {
   print KL $_ -foo-content\n;
   }
}


- I am sure there is more than one day to do it
- thanks in advance
Happy PERL
iBioKid - S K


Re: checking gzip files

2005-12-30 Thread Tom Phoenix
On 12/30/05, S Khadar [EMAIL PROTECTED] wrote:
 $dir = shift;
 $dir =/home/trial;

You seem to be over-writing what you just put into $dir. (I think this
is just your debugging code, though.)

 opendir(M,$dir);

Just as when using open(), it's important to check for errors with
opendir() by using or die. The double quote marks around the
variable name are superfluous.

opendir(M, $dir) or die Can't opendir '$dir': $!;

 $dmchk=zless(  $dir/$_/foo.gz);

 if (-z $dmchk)

That test is looking to see whether the file named by $dmchk is a
zero-byte file. (Again, the double quote marks are superfluous around
a simple scalar name.) So, if that variable contains fred, and
there's an empty file named fred in the current directory, it returns
true. That's probably not what you meant to do. (Did you want to use
regular expressions to examine the content of $dmchk?)

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: checking gzip files

2005-12-30 Thread Adriano Ferreira
On 12/30/05, S Khadar [EMAIL PROTECTED] wrote:
 #!/usr/bin/perl
 use Shell;
...
 $dmchk=zless(  $dir/$_/foo.gz);

As an aside note, Cperldoc Shell advises against this style [ use
Shell nothing ; ]. Prefer this:

  use Shell qw(zless);

so that you know that you are not calling some program by mistake/typo.

From the man page of gzip, the command

  zcat $gzfile | wc -c

is recommended to get the uncompressed file size.

You can use it in backticks, like `zcat $gzfile | wc -l` and then
check the returned number.
But this is rather expensive for large files, since you just want to
know if it has zero bytes or not. Ah, you can use 'zcat' where you are
using 'zless' with the same effect but without a pipe to the unix
command 'less'.

A pure Perl solution would be to use Compress::Zlib (which you
probably has already - for example if you use CPAN) and use a function
like

use Compress::Zlib;

sub zz {
my $f = shift;
my $gz = gzopen($f, 'r') or die; # error handling left as an exercize
return ! $gz-gzread(my $buf, 1); # just one or zero bytes read and dumped
# only return
matters - true for empty, false otherwise
}

my $f = 'a.gz';
print zz($f) ? 'zero bytes' : 'non-empty';

In this case, no need anymore for use Shell, unless you use some
other external utility.

Regards,
Adriano.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: checking gzip files

2005-12-30 Thread Xavier Noria

On Dec 30, 2005, at 13:14, S Khadar wrote:


Hi all,

I have a 15 thousand directories - each of them contain say 10  
files (all

*.gzip)
out of this 10 *.gz files - I want to check whether a file named  
foo.gz
contain any content or not - since my files are gzipped even the  
blank file

occupies some size.


Even if gzipped files have always more than 0 bytes, wouldn't it be  
true than all empty gzipped files have the same size, and that non- 
empty gzipped files are greater than that minimum? In this Mac that  
size seems to be 24 bytes.


If that was the case you could use a regular -s instead of that z*  
trickery, which multiplied by thousands of directories will make a  
difference.


-- fxn

% touch foo
% ls -l foo
-rw-r--r--   1 fxn  staff  0 Dec 30 20:26 foo
% gzip foo
% ls -l foo.gz
-rw-r--r--   1 fxn  staff  24 Dec 30 20:26 foo.gz
% echo x  bar
% ls -l bar
-rw-r--r--   1 fxn  staff  2 Dec 30 20:27 bar
% gzip bar
% ls -l bar.gz
-rw-r--r--   1 fxn  staff  26 Dec 30 20:27 bar.gz


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: checking gzip files

2005-12-30 Thread Adriano Ferreira
On 12/30/05, Xavier Noria [EMAIL PROTECTED] wrote:
 Even if gzipped files have always more than 0 bytes, wouldn't it be
 true than all empty gzipped files have the same size, and that non-
 empty gzipped files are greater than that minimum? In this Mac that
 size seems to be 24 bytes.

Nope. Gzipped files have a header which may include filename.

$ touch foo
$ ls -l foo
-rw-r--r--  1 me mine 0 Dec 30  2005 foo
$ gzip foo
$ ls -l foo.gz
-rw-r--r--  1 me mine 24 Dec 30 18:04 foo.gz
$ touch foobar
$ ls -l foobar
-rw-r--r--  1 me mine 0 Dec 30  2005 foobar
$ gzip foobar
$ ls -l foobar.gz
-rw-r--r--  1 me mine 27 Dec 30 18:04 foobar.gz

Well - it looks like an empty gzipped file with a name takes C21 +
length($name) but that's not reliable since the header size may vary.

$ touch foo
$ ls -l foo
-rw-r--r--  1 me mine 0 Dec 30  2005 foo
$ gzip -n foo # omit name from header
$ ls -l foo.gz
-rw-r--r--  1 me mine 20 Dec 30  2005 foo.gz

To be true, in gzip file specifications, there is a field with the
size of the uncompressed data - but that's what
zlib/zcat/Compress::Zlib access for us to know the file is empty.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response