Alex Ott
Mon, 23 Nov 2009 02:54:35 -0800
Hello Jukka thank you for your answer, i found solution yesterday - problem was, that I hadn't set AutoDetectExtractor as default parser in the context instance
Jukka Zitting at "Mon, 23 Nov 2009 11:45:37 +0100" wrote: JZ> Hi, JZ> On Sat, Nov 21, 2009 at 7:21 PM, Alex Ott <alex...@gmail.com> wrote: >> I have one question - is it possible to extract text not only from single >> document, but also text from documents, embedded into archive? JZ> Yes. >> When i send archive (.zip) to tika, i get only list of files, but how i can >> extract content also from files, stored in this archive? JZ> The default AutoDetectExtractor (or the Tika.parse facade method) will JZ> automatically parse all files within an archive and include the JZ> extracted text in the parse output. JZ> How do you invoke Tika and which version are you using? I use tika (0.6-SNAPSHOT) from clojure -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/ http://xtalk.msk.su/~ott/ http://alexott-ru.blogspot.com/