[ 
https://issues.apache.org/jira/browse/ANY23-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280638#comment-13280638
 ] 

Peter Ansell commented on ANY23-98:
-----------------------------------

I started on a patch for this issue in my github fork but I didn't manage to 
get it complete for N3 for some reason, and it looked rather hacky and prone to 
errors. I ended up switching to checking whether the format was valid using 
Rio.getParser(RDFFormat) and then attempting to parse, as is done now for the 
Turtle parser. I am still attempting to get a clean, small, top section from 
the document, but I had to include some new definitions. [1]

I was going to raise an issue for it but I didn't yet have a solution myself 
that didn't change virtually everything in the way TikaMIMETypeDetector works.

Also, I was able to update to Tika-1.1 successfully. I patched the 
mimetypes.xml file from Tika-1.1 to include the RDF definitions that it did not 
already contain. You can see the patched version at [2]

[1] 
https://github.com/ansell/any23/blob/master/mime/src/main/java/org/apache/any23/mime/TikaMIMETypeDetector.java
[2] 
https://github.com/ansell/any23/blob/master/mime/src/main/resources/org/apache/any23/mime/mimetypes.xml

                
> TikaMIMEtypeDetector doesn't recognize certain file formats when they contain 
> header comments
> ---------------------------------------------------------------------------------------------
>
>                 Key: ANY23-98
>                 URL: https://issues.apache.org/jira/browse/ANY23-98
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Michele Mostarda
>             Fix For: 0.8.0
>
>
> Adding header comments to NQ, N3 and RSS files prevents the 
> TikaMIMEtypeDetector to work properly.
> See #ANY23-97 for further details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to