I have been tasked by my boss of finding out if Nutch indexes content in an image in a pdf document via OCR and then recognize it as text. So in other words, if someone uploads a PDF document to our site, and the PDF document is of an image that is saved as PDF, will nutch search the text within the image and then catalog the text as part of that PDF document?
*Does Nutch index content for .PDF image on text format?*