In any PDF file there are usually a number (sometimes hundreds) of 
lines beginning "/Title", one of which is the title of the PDF in 
question. If it has one, that is.

The attached script, which is really very small, and which I hope 
will provide a moment or two's innocent amusement, aims to extract 
the right /Title line.

It seems to work with encoded and un-encoded PDF files with Mac, Unix 
and Windows line-breaks (even one file with a mixture of all three) 
and runs quite fast.

I would be very grateful to hear from anyone who succeeds in breaking 
it or alternatively finds any use for it.

I hope this isn't too far OT...

Alan Fry

-----------

#!perl -w
use strict;

my $start = (times)[0];

my $f = $ARGV[0];
print "$f\n";

open(IN, $f);
read IN, my($str), -s $f;
close IN;

$str =~ /\/Info\s(\d+)\s0\sR/;
my $info_block = $1;

my $info_start = index($str, "$info_block 0 obj");
my $info_obj   = substr $str, $info_start, index($str, ">>",
                         $info_start)-$info_start+2;

my $title = $info_obj =~ /\/Title\s*\(([^\015\012|\015|\012]*)\)/
                          ? "= $1" : 'undefined';
print "/Title $title\n";

my $finish = (times)[0];
print 'Time taken ', $finish-$start, "\n";

------------



Reply via email to