Re: detect a failed text extraction?

Paco Avila Wed, 25 Nov 2009 09:00:55 -0800

On Wed, Nov 25, 2009 at 2:26 PM, Jukka Zitting <[email protected]> wrote:
> Hi,
>
> On Tue, Nov 24, 2009 at 8:53 PM, Paco Avila <[email protected]> wrote:
>> There is any way to detect a failed text extraction ? I know, I can
>> see the log but the failure it not associated to a file or path.
>> [...]
>> I have posted this question in the user list, but I think it is
>> interesting talking about how it can be achieved.
>
> Could we solve this by improving the level of logging in the indexer?
>
> Alternatively, if you don't have easy access to the log files, we
> could possibly inject some special unique term to the index as a
> marker of failed text extraction. That way you could query for all
> nodes for which text extraction failed.


Increasing the log level can be a goog approach: the objective is link
a failed text extraction with a node path. This way, I can see if the
submitted document has failed in the text extraction process. The
other approach (injecting a special term) also is very cute because I
can get a list of failed indexed document from a XPath query. Both
solutions can be combined to improve the jackrabbit experience: the
XPath query give a list of unindexed document and the log can hep to
know what failed in the text extraction.

> Finally, as a debugging tool we could add a feature to the Jackrabbit
> webapp that allows you to download the extracted text content of a
> binary instead of the binary itself. We'd simply run a new text
> extraction pass on the stored binary and return the extracted text or
> any encountered errors to he client.

This also can be interesting.

>
> BR,
>
> Jukka Zitting
>

-- 
Paco Avila
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Re: detect a failed text extraction?

Reply via email to