Hi Gorka,

 

See: http://wiki.apache.org/tika/TikaOCR/

 

Is that what you’re looking for? If so, then you can simply enable OCR for Tika 
REST server, and then
point your TIka Python at that. Does that help?

 

Cheers,

Chris

 

 

 

 

From: gorka gallo <[email protected]>
Date: Wednesday, May 3, 2017 at 2:19 AM
To: "Mattmann, Chris A (3010)" <[email protected]>
Subject: Apache Tika

 

Hi Chris, 

 

I am Gorka Gallo, a research technician from Bilbao, Spain.

 

Is there any method to extract embedded images in PDF files with Apache Tika 
using Python?

 

Thanks,

 

Best regards,

Gorka.

Reply via email to