Jukka Zitting
Mon, 23 Nov 2009 02:46:29 -0800
Hi, On Sat, Nov 21, 2009 at 7:21 PM, Alex Ott <alex...@gmail.com> wrote: > I have one question - is it possible to extract text not only from single > document, but also text from documents, embedded into archive?
Yes. > When i send archive (.zip) to tika, i get only list of files, but how i can > extract content also from files, stored in this archive? The default AutoDetectExtractor (or the Tika.parse facade method) will automatically parse all files within an archive and include the extracted text in the parse output. How do you invoke Tika and which version are you using? BR, Jukka Zitting