On Wed, 1 Sep 2010, Andrzej Bialecki wrote:
This would be very useful. We contemplated implementing something like this in Nutch, to handle archives (jar/tar/zip/...), but having it in Tika would be much better.

I'd forgotten about tar, that's another one to handle... :)

Does recursive here mean that it would look into embedded zip files too? Or that it would process all paths (since there is really no hierarchy in zip files)?

I was thinking recursive could mean different things. For zip files, tar files etc, it would probably just mean root directory vs descend into all directories. For OLE2, it would mean checking embeded documents of embeded documents (normally but not always by means of descending into child directories). Maybe there's a clearer name for this sort of thing?

Nick

Reply via email to