Dear Jorg, Thank you much for sending this. I have been meaning to reply to your prior emails on the same subject. Yes it will work for other file types. Can you give me an example file and upload it in a Github issue of a file it’s not working for? I can take a look.
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, Open Source Projects Formulation and Development Office (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-502 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 11/3/16, 5:15 PM, "Jörg Bilert" <[email protected]> wrote: Hello Mr Mattman, I have just been looking into your pythong wrapper for tika and I like it a lot. But there is one thing i just don't see. According to the Apache Tika website Tika supports a lot of file formats (even audio and video). Buti don't know how to parse them in python. ODT and PDF work fine like in the samplecode on your github page. Could you give me a clue where to start to handle other file-types? Yours, Jörg Bilert
