RE: PDF parser for Lucene

sampreet Fri, 23 Nov 2001 00:39:51 -0800

Hello,

We have been using PDFHandler - a pdf parser provided by websearch, to
search in pdf files. We are trying to get the contents using
pdfHandler.getContents() to arrive at a context-sensitive summary. However,
it gives some yen signs and other special symbols in the title, summary and
contents. If anyone is using the websearch component to parse pdf files and
have encountered this problem, kindly give your suggestions.


Note - Most of the pdf files are using WinAnsiEncoding, and setting the
encoding as Win-12xx doesn't help.

Thanks in advance,

Sampreet
Programmer


You could try this one:
http://www.i2a.com/websearch/

...and then tell me how it works for you.
=:o)


Anyway, it is simple and Open Source.


Have fun,
Paulo Gaspar


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: PDF parser for Lucene

Reply via email to