Re: [htdig] pdf info

Gilles Detillieux Mon, 29 Jan 2001 10:50:31 -0800
According to Ronald Edward Petty:
> 1) if a pdf is in ur directory that is searched with htdig, other than the
> name of the file.. is the contents looked at?  If so how... I know there
> is something to do with external parsers but i can't find out if this
> works or not.  I thought i read there is an internal one that does it a
> little, but if this is true I cant search a single word in the given pdf.
> 
> any documentation on this?

http://htdig.sourceforge.net/attrs.html#external_parsers
http://htdig.sourceforge.net/FAQ.html#q4.9

The name of the file will not be indexed unless it appears in the link
description text of the document that links to this file (e.g. in an
automatically generated directory index from Apache).

There is a partially internal PDF parser in 3.1.x, which requires
the acroread (Adobe Acrobat Reader 3.0) program to convert PDF files
to PostScript.  We don't recommend using this approach.  Also, the FAQ
entry above is a bit dated, and we don't recommend using parse_doc.pl
as an external parser either.  You should use conv_doc.pl, or better
still, doc2html.pl, as an external converter script.  Does it work?
You bet it does!  I index dozens of PDFs on our site.  Just follow the
instructions that come with the script you use, and be sure to configure
it correctly and install all the conversion programs that the script uses
(especially the xpdf 0.90 or 0.91 package that converts PDFs).

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-general
Re: [htdig] pdf info

Reply via email to