There is any way to detect a failed text extraction ? I know, I can see the log but the failure it not associated to a file or path.
Some times when I upload a document (word, pdf, etc.) to my DMS build on Jackrabbit, it is not indexed. Office documents seems to be specially problematic due to its propietary format. And the problem is that I don't know which document had problems it their text extraction, specially if use extractorPoolSize > 1. I have posted this question in the user list, but I think it is interesting talking about how it can be achieved. -- Paco Avila OpenKM http://www.openkm.com http://www.guia-ubuntu.org
