On Sat, 15 May 2004, Douglas Kline wrote: > Date: Sat, 15 May 2004 19:46:17 -0400 > From: Douglas Kline <[EMAIL PROTECTED]> > To: Joe R. Jah <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED] > Subject: Re: [htdig] Interpreting pdf Files > > > On Thu, 13 May 2004, Douglas Kline wrote: > > > > > Date: Thu, 13 May 2004 20:24:47 -0400 > > > From: Douglas Kline <[EMAIL PROTECTED]> > > > To: [EMAIL PROTECTED] > > > Subject: [htdig] Interpreting pdf Files > > > > > > > > > In an attempt to process pdf files with ht-Dig version 3.2.0b5, I've added the > > > lines > > > > > > external_parsers: > > > application/pdf->text/html <local directory>/xpdf-3.00/xpdf/pdftotext > > > > Add a back-slash at the end of the first line to join the two lines: > > > > external_parsers: \ > > application/pdf->text/html <local directory>/xpdf-3.00/xpdf/pdftotext > > > Thanks. That worked. I got the idea that the back-slashes were needed for > continuation lines after the first line after the "external_parsers:" line. I > think this was from a message on this list of Apr. 16 which had it that way but > other messages from the same day on the same thread had "external_parsers:" and > the following text on the same line.
If you can have them on the same line, that's fine too, but you may have more lines; for example: external_parsers: \ application/rtf->text/html /usr/local/bin/doc2html.pl \ text/rtf->text/html /usr/local/bin/doc2html.pl \ application/pdf->text/html /usr/local/bin/doc2html.pl \ application/postscript->text/html /usr/local/bin/doc2html.pl \ application/msword->text/html /usr/local/bin/doc2html.pl \ application/wordperfect5.1->text/html /usr/local/bin/doc2html.pl \ application/msexcel->text/html /usr/local/bin/doc2html.pl \ application/vnd.ms-excel->text/html /usr/local/bin/doc2html.pl \ application/vnd.ms-powerpoint->text/html /usr/local/bin/doc2html.pl \ application/x-shockwave-flash->text/html /usr/local/bin/doc2html.pl \ application/x-shockwave-flash2-preview->text/html /usr/local/bin/doc2html.pl You may want to install doc2html.pl: http://www.htdig.org/FAQ.html#q4.9 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED] > Now I'm getting a different error. It's finding the pdftotext command but > outputting the help text you get if you don't give it any arguments. Evidently > it isn't being passed the file to be converted from pdf to text format. The > documentation on external_parsers in the Configuration file format -- > Attributes Web page doesn't seem to deal with passing arguments that refer to > the pages being indexed. Yet even if it's passing the file to be converted the > following arguments would vary from one converter to another and so there must > be some way to indicate them in the htdig.conf file. The documentation says > you can include arguments if you quote the whole command string but how do I > indicate the file to be converted and where should the output of that command > go? The documentation also says, "Unless it is an external converter, which > will output a document of a different content-type, then its output must follow > the format described here." I'm guessing that my case here is one of the > external converters and the output doesn't have to conform to that format. > The documentation also says, "If the second type is user-defined, then it's up > to the converter script to put out a "Content-Type: type" header followed by a > blank line, to indicate to htdig what type it should expect for the output, > much like what a CGI script would do." Is this a user-defined second type? > I'm guessing that it isn't since it's plain text? > > TIA. > > Douglas > > ======== > Douglas Kline > [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

