Title: laola2html.pl
That looks interesting, I will certainly look to including your code in the next release of doc2html, at the moment I'm too pushed for time to do much.
 
Doc2html version 3.0 does have provision for coping when a utility hangs.  You just have to install the small Perl module Sys::AlarmCall 
 
--
David Adams
Computing Services
Southampton University
----- Original Message -----
Sent: Tuesday, July 03, 2001 7:06 PM
Subject: [htdig] laola2html.pl

[apology for HTML email - MS Exchange ignores my Outlook client's plain text settings - known bug]

I recently asked about using doc2html with word2x, for indexing Word documents.  I have since found that word2x, at least on my system, would hang on some documents and stop the whole dig dead.  This is probably just my system, not word2x.

I have since switched to using the perl library LAOLA

http://snake.cs.tu-berlin.de:8081/~schwartz/pmh/

which is working like a charm.  I modified David Adam's pdf2html.pl to make a laola2html.pl, complete with title, subject, and keywords extraction.  It's depressing (and sometimes funny) how few authors set these attributes, by the way.  Watch out for "Sample Manual Title" and such ;)

Anyway, I attach it here for anyone to use, and also if anyone has suggestions for improvement.  As I said, I basically hacked up pdf2html.pl, so I'm sure this could be optimized and improved.

-Greg Holmes

Reply via email to