[ 
https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846978#comment-13846978
 ] 

Peter Ansell commented on TIKA-1208:
------------------------------------

The only ones I referenced above that are not supported in either RDFFormat or 
TupleQueryResultsFormat, are the N-Triples, N-Quads, and TriG formats where 
their specifications are only recently entering the final stages of being 
standardised at W3C. 

Their inclusion in Sesame, as the default mime types for their respective 
RDFFormat constants, relies on the actual parsers and writers being updated. 
Including them before that point will imply to users that we have updated the 
parsers and writers to the new specifications. The new specs have some minor 
semantic differences such as N-Triples and N-Quads using UTF-8 by default 
(yay!), and the TriG spec has had a full workover since its previous version so 
that will take some work to get it through.

> Migrate Any23 mime contributions to Tika
> ----------------------------------------
>
>                 Key: TIKA-1208
>                 URL: https://issues.apache.org/jira/browse/TIKA-1208
>             Project: Tika
>          Issue Type: Sub-task
>          Components: mime
>            Reporter: Lewis John McGibbney
>             Fix For: 1.5
>
>
> We begin with one of the most obvious areas in which there
> is overlap.
> In short, the appeal of this package is the addition of detection 
> for the following types:
>  - text/n3
>  - text/rdf+n3
>  - application/n3
>  - text/x-nquads
>  - text/rdf+nq
>  - text/nq
>  - application/nq
>  - text/turtle
>  - application/x-turtle
>  - application/turtle
>  - application/trix
>  
> Therefore although both Tika and Any23 execute the task of Mimetype-related
> tasks, there is a contribution to be made. This involves the trasferral of
> code pertaining to pattern recogition, Mimetype XML defitinions within 
> tika-mimetypes.xml and a Purifier implementation that removes all 
> the eventual blank characters at the header of a file that might 
> prevents its MIME Type detection.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to