Hi Gorka,
See: http://wiki.apache.org/tika/TikaOCR/ Is that what you’re looking for? If so, then you can simply enable OCR for Tika REST server, and then point your TIka Python at that. Does that help? Cheers, Chris From: gorka gallo <[email protected]> Date: Wednesday, May 3, 2017 at 2:19 AM To: "Mattmann, Chris A (3010)" <[email protected]> Subject: Apache Tika Hi Chris, I am Gorka Gallo, a research technician from Bilbao, Spain. Is there any method to extract embedded images in PDF files with Apache Tika using Python? Thanks, Best regards, Gorka.
