[ 
https://issues.apache.org/jira/browse/JCR-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010800#comment-13010800
 ] 

Jukka Zitting commented on JCR-2873:
------------------------------------

Yes, to the search index such documents look like simple text documents that 
contain just the string "TextExtractionError". You can query for that token and 
include any other constraints (path, etc.) just like when searching for normal 
documents.

PS. In revision 1085050 I excluded extraction errors caused by linkage problems 
from being reported. They are caused by required extraction libraries not being 
present, which is a configuration/deployment choice instead of any inherent 
problems with the documents being parsed.

> Add a way to locate full text extraction problems
> -------------------------------------------------
>
>                 Key: JCR-2873
>                 URL: https://issues.apache.org/jira/browse/JCR-2873
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: indexing, jackrabbit-core
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> Full text indexing of a binary document can fail for various reasons. 
> Currently we just log a generic error message in such cases, which makes it 
> difficult for the user to locate such problems for review and reindexing. We 
> should improve this by making the logs more informative or by adding some 
> other mechanism for locating troublesome documents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to