Hi,
Why not just do an os.walk or os.listdir in python, and then for each file, call Tika, e.g., import os import json from tika import parser fs = os.listdir(‘/some/path’) fs = [f for f in fs if os.isfile(f) and (str(f).endswith(‘.pdf’) or str(f).endswith(‘.doc’))] for f in fs: parsed = parser.from_file(f) # save parsed to file json.dump(parsed, ‘/some/other/path’) Cheers, Chris From: Victor Olaiya <[email protected]> Date: Monday, August 19, 2019 at 8:28 AM To: "Mattmann, Chris A (US 1761)" <[email protected]> Subject: [EXTERNAL] Urgent!!! Tika-python Hello, I sent a mail to the mailing list with no response, so I decided to mail you again. I have been trying to extract text from all pdfs and doc etc files in a directory and that has been impossible as Tika-python does not allow parsing of directory only files. I was able to compress the files in a single zip file and extract, this worked but the extracted text where saved in a single file, i need the files to be saved in their individual files so I can use them as input to another program. Please what is the best method to go about this. Thank you Chris Mattmann, I await your reply.
