[ https://issues.apache.org/jira/browse/JCR-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298779#comment-16298779 ]
Tim Allison edited comment on JCR-4215 at 12/20/17 5:31 PM: ------------------------------------------------------------ Tika's behavior, even in 1.16, was to sniff the bytes and trust those over what comes in via the metadata's {{Content-Type}}. Before, you weren't sending any bytes, so it relied on what you told it. Now, you're sending bytes to avoid the {{ZeroByteException}}, and it is sniffing those bytes, detecting text and ignoring the mime you are sending in. To trigger the BlockingParser: 1. Change the line you mentioned above to: {noformat} resource.setProperty("jcr:data", "<?xml version=\"1.0\" encoding=\"UTF-8\" ?> <blocked>FOOBAR</blocked>", PropertyType.BINARY); {noformat} 2. Add a file called {{custom-mimetypes.xml}} in {{test/resources/org/apache/tika/mime}} that looks like this: {noformat} <?xml version="1.0" encoding="UTF-8"?> <!-- ASL 2.0 --> <mime-info> <!-- add this for detection to trigger the BlockingParser --> <mime-type type="application/x-blocked"> <root-XML localName="blocked"/> <sub-class-of type="application/xml"/> </mime-type> </mime-info> {noformat} As a side note: if you want to override the detector and have it believe whatever you tell it the file is, you can do this with {{TikaCoreProperties.CONTENT_TYPE_OVERRIDE}} as of 1.17. was (Author: talli...@mitre.org): Tika's behavior, even in 1.16, was to sniff the bytes and trust those over what comes in via the metadata's Content-Type. Before, you weren't sending any bytes, so it relied on what you told it. Now, you're sending bytes to avoid the ZeroByteException, and it is sniffing those bytes, detecting text and ignoring the mime you are sending in. To trigger the BlockingParser: 1. Change the line you mentioned above to: {noformat} resource.setProperty("jcr:data", "<?xml version=\"1.0\" encoding=\"UTF-8\" ?> <blocked>FOOBAR</blocked>", PropertyType.BINARY); {noformat} 2. Add a file called {{custom-mimetypes.xml}} in {{test/resources/org/apache/tika/mime}} that looks like this: {noformat} <?xml version="1.0" encoding="UTF-8"?> <!-- ASL 2.0 --> <mime-info> <!-- add this to send files to the BlockingParser --> <mime-type type="application/x-blocked"> <root-XML localName="blocked"/> <sub-class-of type="application/xml"/> </mime-type> </mime-info> {noformat} As a side note: if you want to override the detector and have it believe whatever you tell it the file is, you can do this with {{TikaCoreProperties.CONTENT_TYPE_OVERRIDE}} as of 1.17. > Use Tika version 1.17 > --------------------- > > Key: JCR-4215 > URL: https://issues.apache.org/jira/browse/JCR-4215 > Project: Jackrabbit Content Repository > Issue Type: Task > Components: parent > Reporter: Julian Reschke > Assignee: Julian Reschke > Fix For: 2.18 > > Attachments: > TEST-org.apache.jackrabbit.core.query.lucene.IndexingQueueTest.xml, > org.apache.jackrabbit.core.query.lucene.IndexingQueueTest.log, > org.apache.jackrabbit.core.query.lucene.IndexingQueueTest.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)