|
Believe me, I'm getting ready to spend a bit, but for
some political reasons on this project, it would probably be my own pocket
money!
As for
catdoc, I simply haven't been able to use it. The current and previous
version wouldn't compile for me (on Cygwin on NT) and the binary I tried just
emits very unhelpful errors.
What's
frustrating is that LAOLA works like a charm using laola2html or doc2html on the
command line. It is only during a dig, inconsistently, that I get an
excerpt like I showed before (always the same exact odd characters, BTW).
The first, small test dig I did appeared to be fine (which is why I posted the
script and reported seeming success, silly me) but when I went to dig our whole
intranet site this thing started cropping up.
If I
take one of the exact Word docs that had the weird excerpt and process it on the
command line with laola2html.pl or doc2html.pl (calling the other), it spits out
a perfectly fine web page (that should be indexed perfectly well if
returned to htdig during the dig). Weird.
I'll
probably end up buying wp2html and using it with your standard scripts; all I
can say is this must be one messed up file format if it is that difficult to
dump text out of it!
|
Title: laola2html.pl
- [htdig] laola2html.pl Holmes, Gregory
- Re: [htdig] laola2html.pl David Adams
- Re: [htdig] laola2html.pl Holmes, Gregory
- Re: [htdig] laola2html.pl David Adams
- Re: [htdig] laola2html.pl David Adams
- RE: [htdig] laola2html.pl Holmes, Gregory
- RE: [htdig] laola2html.pl Holmes, Gregory
- Re: [htdig] laola2html.pl David Adams
- RE: [htdig] laola2html.pl Holmes, Gregory
- Re: [htdig] laola2html.pl David Adams

