I used the following code example from an article that I linked off of jakarta's site to index PDF files:

doc.add(Field.Text("content", new FileReader(f)));

But I realized today that this method only indexes the PDF as is. For those wondering if the the PDF were actually indexed or if maybe they only contained images, well I verified this with LUKE and those PDFs are in there, but the only keywords that were indexed were the PDF defintion statements and encoded stuff.

So what is the proper way to index a PDF?

Thank You


Don Vaillancourt Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.













Reply via email to