Title: laola2html.pl
Believe me, I'm getting ready to spend a bit, but for some political reasons on this project, it would probably be my own pocket money!
 
As for catdoc, I simply haven't been able to use it.  The current and previous version wouldn't compile for me (on Cygwin on NT) and the binary I tried just emits very unhelpful errors.
 
What's frustrating is that LAOLA works like a charm using laola2html or doc2html on the command line.  It is only during a dig, inconsistently, that I get an excerpt like I showed before (always the same exact odd characters, BTW).  The first, small test dig I did appeared to be fine (which is why I posted the script and reported seeming success, silly me) but when I went to dig our whole intranet site this thing started cropping up.
 
If I take one of the exact Word docs that had the weird excerpt and process it on the command line with laola2html.pl or doc2html.pl (calling the other), it spits out a perfectly fine web page (that should be indexed perfectly well if returned to htdig during the dig).  Weird.
 
I'll probably end up buying wp2html and using it with your standard scripts; all I can say is this must be one messed up file format if it is that difficult to dump text out of it!
-----Original Message-----
From: David Adams [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 05, 2001 10:50 AM
To: Holmes, Gregory
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] laola2html.pl

I have no experience with laola, or other Perl software for MS files.  I use a combination of wp2html and catdoc, and though I can't recall ever seeing anything like ��ࡱ�, I could well believe that catdoc could produce such output.  However, I've no explanation for what you are seeing.  Htdig launches an external parser and waits for it to complete, as you know from your expereince with word2x :)
 
If your Word documents are Word97 and later you might find the best solution is to spend a few pounds on wp2html.
 
I don't recall you every giving a reason for rejecting catdoc?
 
--
David Adams
Computing Services
Southampton University
 

Reply via email to