[
https://issues.apache.org/jira/browse/JCR-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010800#comment-13010800
]
Jukka Zitting commented on JCR-2873:
------------------------------------
Yes, to the search index such documents look like simple text documents that
contain just the string "TextExtractionError". You can query for that token and
include any other constraints (path, etc.) just like when searching for normal
documents.
PS. In revision 1085050 I excluded extraction errors caused by linkage problems
from being reported. They are caused by required extraction libraries not being
present, which is a configuration/deployment choice instead of any inherent
problems with the documents being parsed.
> Add a way to locate full text extraction problems
> -------------------------------------------------
>
> Key: JCR-2873
> URL: https://issues.apache.org/jira/browse/JCR-2873
> Project: Jackrabbit Content Repository
> Issue Type: Improvement
> Components: indexing, jackrabbit-core
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Priority: Minor
> Fix For: 2.3.0
>
>
> Full text indexing of a binary document can fail for various reasons.
> Currently we just log a generic error message in such cases, which makes it
> difficult for the user to locate such problems for review and reindexing. We
> should improve this by making the logs more informative or by adding some
> other mechanism for locating troublesome documents.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira