[ 
https://issues.apache.org/jira/browse/JCR-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829153#comment-13829153
 ] 

Claudiu commented on JCR-3667:
------------------------------

Hi,
   Why is it mentioned that the problem is fixed for 2.6.4 version as it isn't?
   The downloadable artifacts from Jackrabbit page (e.g. I'm using the rar 
archive) for 2.6.4 still uses tika 1.3, although upgrading to 1.4 does not make 
any difference as the problem is located at XMLParser level that does not know 
how to resolve text/xml media type.
   I hope that Jukka's recommendation of asking Tika to normalize type names is 
really a task in progress.
   I recently upgraded from 2.4.0 to 2.6.4 and I was really puzzled that xml 
content was not indexed anymore.

Regards,
 Claudiu

> Possible regression with accepted content types when extracting and indexing 
> binary values
> ------------------------------------------------------------------------------------------
>
>                 Key: JCR-3667
>                 URL: https://issues.apache.org/jira/browse/JCR-3667
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>    Affects Versions: 2.4.4, 2.6.3
>            Reporter: Cédric Damioli
>            Assignee: Jukka Zitting
>              Labels: patch
>             Fix For: 2.7.3
>
>
> JCR-3476 introduced a mime-type test before parsing binary values, based on 
> Tika's supported parsers.
> This may lead to incorrect behaviours, with a "text/xml" not being extracted 
> and indexed because the XMLParser does not declare "text/xml" as a supported 
> type.
> The problem here is that there is a regression between 2.4.3 and 2.4.4, 
> because the same content was previously well recognized by Tika's Detector 
> and then extracted.
> Furthermore, it seems to me inconsistent on one hand to rely on the declared 
> content type and on the other hand to delegate the actual type detection to 
> Tika ? 
> This may lead to cases where the jcr:mimeType value is set to eg. 
> "application/pdf" but detected and parsed by Tika as "text/plain" with no 
> error.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to