Fwd: checking gzip files
Hi all, I have a 15 thousand directories - each of them contain say 10 files (all *.gzip) out of this 10 *.gz files - I want to check whether a file named foo.gz contain any content or not - since my files are gzipped even the blank file occupies some size. have a look at my code - #!/usr/bin/perl use Shell; $dir = shift; $dir =/home/trial; opendir(M,$dir); @a = readdir(M); close M; open(KL,Dumo-chk); print KLList of foo.gz with no contents \n; foreach (@a) { print\n\t\tProcessing $_\n; print\tUncompressing and Readingfoo.gz\n; next if ($_ =~ /^\./); # my code works fine untill here --- here i have the problem $dmchk=zless( $dir/$_/foo.gz); if (-z $dmchk) { print KL \n$_ No - content\n; } else { print KL $_ -foo-content\n; } } - I am sure there is more than one day to do it - thanks in advance Happy PERL iBioKid - S K
Re: checking gzip files
On 12/30/05, S Khadar [EMAIL PROTECTED] wrote: $dir = shift; $dir =/home/trial; You seem to be over-writing what you just put into $dir. (I think this is just your debugging code, though.) opendir(M,$dir); Just as when using open(), it's important to check for errors with opendir() by using or die. The double quote marks around the variable name are superfluous. opendir(M, $dir) or die Can't opendir '$dir': $!; $dmchk=zless( $dir/$_/foo.gz); if (-z $dmchk) That test is looking to see whether the file named by $dmchk is a zero-byte file. (Again, the double quote marks are superfluous around a simple scalar name.) So, if that variable contains fred, and there's an empty file named fred in the current directory, it returns true. That's probably not what you meant to do. (Did you want to use regular expressions to examine the content of $dmchk?) Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: checking gzip files
On 12/30/05, S Khadar [EMAIL PROTECTED] wrote: #!/usr/bin/perl use Shell; ... $dmchk=zless( $dir/$_/foo.gz); As an aside note, Cperldoc Shell advises against this style [ use Shell nothing ; ]. Prefer this: use Shell qw(zless); so that you know that you are not calling some program by mistake/typo. From the man page of gzip, the command zcat $gzfile | wc -c is recommended to get the uncompressed file size. You can use it in backticks, like `zcat $gzfile | wc -l` and then check the returned number. But this is rather expensive for large files, since you just want to know if it has zero bytes or not. Ah, you can use 'zcat' where you are using 'zless' with the same effect but without a pipe to the unix command 'less'. A pure Perl solution would be to use Compress::Zlib (which you probably has already - for example if you use CPAN) and use a function like use Compress::Zlib; sub zz { my $f = shift; my $gz = gzopen($f, 'r') or die; # error handling left as an exercize return ! $gz-gzread(my $buf, 1); # just one or zero bytes read and dumped # only return matters - true for empty, false otherwise } my $f = 'a.gz'; print zz($f) ? 'zero bytes' : 'non-empty'; In this case, no need anymore for use Shell, unless you use some other external utility. Regards, Adriano. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: checking gzip files
On Dec 30, 2005, at 13:14, S Khadar wrote: Hi all, I have a 15 thousand directories - each of them contain say 10 files (all *.gzip) out of this 10 *.gz files - I want to check whether a file named foo.gz contain any content or not - since my files are gzipped even the blank file occupies some size. Even if gzipped files have always more than 0 bytes, wouldn't it be true than all empty gzipped files have the same size, and that non- empty gzipped files are greater than that minimum? In this Mac that size seems to be 24 bytes. If that was the case you could use a regular -s instead of that z* trickery, which multiplied by thousands of directories will make a difference. -- fxn % touch foo % ls -l foo -rw-r--r-- 1 fxn staff 0 Dec 30 20:26 foo % gzip foo % ls -l foo.gz -rw-r--r-- 1 fxn staff 24 Dec 30 20:26 foo.gz % echo x bar % ls -l bar -rw-r--r-- 1 fxn staff 2 Dec 30 20:27 bar % gzip bar % ls -l bar.gz -rw-r--r-- 1 fxn staff 26 Dec 30 20:27 bar.gz -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: checking gzip files
On 12/30/05, Xavier Noria [EMAIL PROTECTED] wrote: Even if gzipped files have always more than 0 bytes, wouldn't it be true than all empty gzipped files have the same size, and that non- empty gzipped files are greater than that minimum? In this Mac that size seems to be 24 bytes. Nope. Gzipped files have a header which may include filename. $ touch foo $ ls -l foo -rw-r--r-- 1 me mine 0 Dec 30 2005 foo $ gzip foo $ ls -l foo.gz -rw-r--r-- 1 me mine 24 Dec 30 18:04 foo.gz $ touch foobar $ ls -l foobar -rw-r--r-- 1 me mine 0 Dec 30 2005 foobar $ gzip foobar $ ls -l foobar.gz -rw-r--r-- 1 me mine 27 Dec 30 18:04 foobar.gz Well - it looks like an empty gzipped file with a name takes C21 + length($name) but that's not reliable since the header size may vary. $ touch foo $ ls -l foo -rw-r--r-- 1 me mine 0 Dec 30 2005 foo $ gzip -n foo # omit name from header $ ls -l foo.gz -rw-r--r-- 1 me mine 20 Dec 30 2005 foo.gz To be true, in gzip file specifications, there is a field with the size of the uncompressed data - but that's what zlib/zcat/Compress::Zlib access for us to know the file is empty. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response