Title: RE: [htdig] laola2html.pl

It isn't that; my max_doc_size is 10000000, and the Word docs with the strange character string for an excerpt come in varying sizes, some smaller than documents that did index.

Even more odd, I got catdoc to compile finally (the cygwin unistd.h doesn't include getopt.h, so I had to include it myself) and I modified laola2html.pl to use catdoc for the body, while still using ldat for the meta information.

I get the same problem (or possibly worse, all the Word docs I can find exhibit the problem.)  Since catdoc is producing the body now, I'm thinking it must be my script somehow, not LAOLA, causing it.  But the script still works on every 'problem' Word doc, on the command line.  Hmm...

At least I should be able to use catdoc for text only conversion now (though I probably speak too soon)!


-----Original Message-----
From: David Adams [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 05, 2001 11:40 AM
To: Holmes, Gregory
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] laola2html.pl


Just a thought:  is your problem with laola2html simply that htdig is truncating the document?

Try using the max_doc_size attribute in your configuration file.

--
David Adams
Computing Services
Southampton University

Reply via email to