[
https://issues.apache.org/jira/browse/STANBOL-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-809.
-----------------------------------------
Resolution: Implemented
Assignee: Rupert Westenthaler
implemented in trunk with http://svn.apache.org/viewvc?rev=1413551&view=rev
> Parse ConentItem URI to the Tika content type detector
> ------------------------------------------------------
>
> Key: STANBOL-809
> URL: https://issues.apache.org/jira/browse/STANBOL-809
> Project: Stanbol
> Issue Type: Bug
> Components: Engine - Tika
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Priority: Minor
>
> The content type detection could be improved by using the URI of the
> processed content item as the Tika API allows to explicitly parse the file
> name (or URI) of an resource as input parameter to the content type
> detection. (see
> https://tika.apache.org/1.2/detection.html#Resource_Name_Based_Detection)
> Metadata m = new Metadata();
> m.add(Metadata.RESOURCE_NAME_KEY,
> contentItem.getUri().getUnicodeString());
> detector.detect(is, m)
> this would mean that the filename pattern based recognition would
> work when you manually set the contentItem URI in the request to the Stanbol
> enhancer e.g.
> curl -X POST -H "Accept: text/turtle" -T test.docx \
> http://dev.iks-project.eu:8080/enhancer/engine/tika?id=\
> http://www.example.com/test.docx
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira