Hi, On Tue, Nov 24, 2009 at 8:53 PM, Paco Avila <[email protected]> wrote: > There is any way to detect a failed text extraction ? I know, I can > see the log but the failure it not associated to a file or path. > [...] > I have posted this question in the user list, but I think it is > interesting talking about how it can be achieved.
Could we solve this by improving the level of logging in the indexer? Alternatively, if you don't have easy access to the log files, we could possibly inject some special unique term to the index as a marker of failed text extraction. That way you could query for all nodes for which text extraction failed. Finally, as a debugging tool we could add a feature to the Jackrabbit webapp that allows you to download the extracted text content of a binary instead of the binary itself. We'd simply run a new text extraction pass on the stored binary and return the extracted text or any encountered errors to he client. BR, Jukka Zitting
