External parsers and converters usually use the xpdf package to extract the
text from PDF files.  If there is no text to extract, because it only
contains an image, then no text can be extracted.

Using OCR to extract text from images during indexing is theoretically
possible, but I have not heard of anybody doing it.

--
David Adams
Information Systems Services
Southampton University


----- Original Message -----
From: "Robert Isaac" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, January 18, 2003 9:09 PM
Subject: [htdig] pdf files


> This may sound a silly question, but if pdf files need to be indexed with
> htdig with an external parser, does the text in the files to be pdf's need
> to be scanned as text, or can they still be read if scanned as image. The
> reason I ask is that many of the documents I want to pdf have poor paper
> and type. Thanks
>
> Bob
>
> VOLVO OWNERS CLUB ONLINE
> Robert Isaac, Director, Volvo Owners Club Limited
> All email messages are virus scanned before being sent
> PLEASE INCLUDE ALL PREVIOUS MESSAGE TEXT WITH REPLY
>
> Club web site: www.volvoclub.org.uk
>
> Also visit: www.trisaac.com for
> John Wayne Collectors Plates
> Roil Products
> Neways International
>
>
>
>
>
> -------------------------------------------------------
> This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will
> allow you to extend the highest allowed 128 bit encryption to all your
> clients even if they use browsers that are limited to 40 bit encryption.
> Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>



-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to