tika-user  

Re: how to handle files in archive with tika?

Alex Ott
Mon, 23 Nov 2009 02:54:35 -0800

Hello Jukka

thank you for your answer, i found solution yesterday - problem was, that I
hadn't set AutoDetectExtractor as default parser in the context instance

Jukka Zitting  at "Mon, 23 Nov 2009 11:45:37 +0100" wrote:
 JZ> Hi,

 JZ> On Sat, Nov 21, 2009 at 7:21 PM, Alex Ott <alex...@gmail.com> wrote:
 >> I have one question - is it possible to extract text not only from single
 >> document, but also text from documents, embedded into archive?

 JZ> Yes.

 >> When i send archive (.zip) to tika, i get only list of files, but how i can
 >> extract content also from files, stored in this archive?

 JZ> The default AutoDetectExtractor (or the Tika.parse facade method) will
 JZ> automatically parse all files within an archive and include the
 JZ> extracted text in the parse output.

 JZ> How do you invoke Tika and which version are you using?

I use tika (0.6-SNAPSHOT) from clojure

-- 
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/           http://xtalk.msk.su/~ott/
http://alexott-ru.blogspot.com/