In any PDF file there are usually a number (sometimes hundreds) of
lines beginning "/Title", one of which is the title of the PDF in
question. If it has one, that is.
The attached script, which is really very small, and which I hope
will provide a moment or two's innocent amusement, aims to extract
the right /Title line.
It seems to work with encoded and un-encoded PDF files with Mac, Unix
and Windows line-breaks (even one file with a mixture of all three)
and runs quite fast.
I would be very grateful to hear from anyone who succeeds in breaking
it or alternatively finds any use for it.
I hope this isn't too far OT...
Alan Fry
-----------
#!perl -w
use strict;
my $start = (times)[0];
my $f = $ARGV[0];
print "$f\n";
open(IN, $f);
read IN, my($str), -s $f;
close IN;
$str =~ /\/Info\s(\d+)\s0\sR/;
my $info_block = $1;
my $info_start = index($str, "$info_block 0 obj");
my $info_obj = substr $str, $info_start, index($str, ">>",
$info_start)-$info_start+2;
my $title = $info_obj =~ /\/Title\s*\(([^\015\012|\015|\012]*)\)/
? "= $1" : 'undefined';
print "/Title $title\n";
my $finish = (times)[0];
print 'Time taken ', $finish-$start, "\n";
------------