On Wed, Nov 25, 2009 at 2:26 PM, Jukka Zitting <[email protected]> wrote: > Hi, > > On Tue, Nov 24, 2009 at 8:53 PM, Paco Avila <[email protected]> wrote: >> There is any way to detect a failed text extraction ? I know, I can >> see the log but the failure it not associated to a file or path. >> [...] >> I have posted this question in the user list, but I think it is >> interesting talking about how it can be achieved. > > Could we solve this by improving the level of logging in the indexer? > > Alternatively, if you don't have easy access to the log files, we > could possibly inject some special unique term to the index as a > marker of failed text extraction. That way you could query for all > nodes for which text extraction failed.
Increasing the log level can be a goog approach: the objective is link a failed text extraction with a node path. This way, I can see if the submitted document has failed in the text extraction process. The other approach (injecting a special term) also is very cute because I can get a list of failed indexed document from a XPath query. Both solutions can be combined to improve the jackrabbit experience: the XPath query give a list of unindexed document and the log can hep to know what failed in the text extraction. > Finally, as a debugging tool we could add a feature to the Jackrabbit > webapp that allows you to download the extracted text content of a > binary instead of the binary itself. We'd simply run a new text > extraction pass on the stored binary and return the extracted text or > any encountered errors to he client. This also can be interesting. > > BR, > > Jukka Zitting > -- Paco Avila OpenKM http://www.openkm.com http://www.guia-ubuntu.org
