[
https://issues.apache.org/jira/browse/JCR-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298779#comment-16298779
]
Tim Allison commented on JCR-4215:
----------------------------------
Tika's behavior, even in 1.16, was to sniff the bytes and trust those over what
comes in via the metadata's Content-Type. Before, you weren't sending any
bytes, so it relied on what you told it. Now, you're sending bytes to avoid
the ZeroByteException, and it is sniffing those bytes, detecting text and
ignoring the mime you are sending in.
To trigger the BlockingParser:
1. Change the line you mentioned above to:
{noformat}
resource.setProperty("jcr:data", "<?xml version=\"1.0\"
encoding=\"UTF-8\" ?>
<blocked>FOOBAR</blocked>", PropertyType.BINARY);
{noformat}
2. Add a file called {{custom-mimetypes.xml}} in
{{test/resources/org/apache/tika/mime}} that looks like this:
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<!-- ASL 2.0 -->
<mime-info>
<!-- add this to send files to the BlockingParser -->
<mime-type type="application/x-blocked">
<root-XML localName="blocked"/>
<sub-class-of type="application/xml"/>
</mime-type>
</mime-info>
{noformat}
As a side note: if you want to override the detector and have it believe
whatever you tell it the file is, you can do this with
{{TikaCoreProperties.CONTENT_TYPE_OVERRIDE}} as of 1.17.
> Use Tika version 1.17
> ---------------------
>
> Key: JCR-4215
> URL: https://issues.apache.org/jira/browse/JCR-4215
> Project: Jackrabbit Content Repository
> Issue Type: Task
> Components: parent
> Reporter: Julian Reschke
> Assignee: Julian Reschke
> Fix For: 2.18
>
> Attachments:
> TEST-org.apache.jackrabbit.core.query.lucene.IndexingQueueTest.xml,
> org.apache.jackrabbit.core.query.lucene.IndexingQueueTest.log,
> org.apache.jackrabbit.core.query.lucene.IndexingQueueTest.txt
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)