On Fri, 28 Mar 2014, eShard wrote:
I'm using solr 4.0 Final
I need movies "hidden" in zip files that need to be excluded from the index.
I can't filter movies on the crawler because then I would have to exclude
all zip files.

If you're calling Tika directly, this is very easy. When tika hits embedded resources, it'll call out to your code, and you can select then if you want to process each one or ignore each one

(This is all done via an EmbeddedDocumentExtractor, which you supply on the ParseContext)

How do I exclude a file in the tika configuration? I assume it's something I add in the update/extract handler but I'm not sure.

I've no idea how / if you can tell the SOLR code to ask Tika to do that or not, that's something you'll have to go back to the SOLR community about as they maintain that code

Nick

Reply via email to