At 4:12 PM -0400 8/20/01, Steele, David S. wrote:
>htdig uses parse_doc.pl to call pdftotext, and this guy does a lot to the
>output of the pdf parser. I don't see why these modifications are needed.

The parse_doc script is an external parser, rather than an external 
converter and as such must mark up the output to match that expected 
by the ExternalParser code: 
<http://www.htdig.org/attrs.html#external_parsers>

More useful for your purposes would be the conv_doc.pl script, which 
is an external converter.

>- If pdftohtml produces a single clean html file from PDF input, can I make
>this work by just identifying it as the external_handler for pdf (with
>appropriate arguments)?

Not quite, but close.

>- Does something else have to be done to tell htdig to process the file as
>html, with links?

See the description of the external converter features in the 
documentation URL above.

-- 
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to