> On Thu, 13 May 2004, Douglas Kline wrote:
> 
> > Date: Thu, 13 May 2004 20:24:47 -0400
> > From: Douglas Kline <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Subject: [htdig] Interpreting pdf Files
> > 
> > 
> > In an attempt to process pdf files with ht-Dig version 3.2.0b5, I've added 
the
> > lines
> > 
> > external_parsers:
> >  application/pdf->text/html   <local directory>/xpdf-3.00/xpdf/pdftotext
> 
> Add a back-slash at the end of the first line to join the two lines:
> 
> external_parsers: \
>  application/pdf->text/html   <local directory>/xpdf-3.00/xpdf/pdftotext


Thanks.  That worked.  I got the idea that the back-slashes were needed for
continuation lines after the first line after the "external_parsers:" line.  I
think this was from a message on this list of Apr. 16 which had it that way but
other messages from the same day on the same thread had "external_parsers:" and
the following text on the same line.

Now I'm getting a different error.  It's finding the pdftotext command but
outputting the help text you get if you don't give it any arguments.  Evidently
it isn't being passed the file to be converted from pdf to text format.  The
documentation on external_parsers in the Configuration file format --
Attributes Web page doesn't seem to deal with passing arguments that refer to
the pages being indexed.  Yet even if it's passing the file to be converted the
following arguments would vary from one converter to another and so there must
be some way to indicate them in the htdig.conf file.  The documentation says
you can include arguments if you quote the whole command string but how do I
indicate the file to be converted and where should the output of that command
go?  The documentation also says, "Unless it is an external converter, which
will output a document of a different content-type, then its output must follow
the format described here."  I'm guessing that my case here is one of the
external converters and the output doesn't have to conform to that format.
The documentation also says, "If the second type is user-defined, then it's up
to the converter script to put out a "Content-Type: type" header followed by a
blank line, to indicate to htdig what type it should expect for the output,
much like what a CGI script would do."  Is this a user-defined second type?
I'm guessing that it isn't since it's plain text?

TIA.

Douglas 

========
Douglas Kline
[EMAIL PROTECTED]




-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to