Rupert Westenthaler created STANBOL-809:
-------------------------------------------
Summary: Parse ConentItem URI to the Tika content type detector
Key: STANBOL-809
URL: https://issues.apache.org/jira/browse/STANBOL-809
Project: Stanbol
Issue Type: Bug
Components: Engine - Tika
Reporter: Rupert Westenthaler
Priority: Minor
The content type detection could be improved by using the URI of the processed
content item as the Tika API allows to explicitly parse the file name (or URI)
of an resource as input parameter to the content type detection. (see
https://tika.apache.org/1.2/detection.html#Resource_Name_Based_Detection)
Metadata m = new Metadata();
m.add(Metadata.RESOURCE_NAME_KEY,
contentItem.getUri().getUnicodeString());
detector.detect(is, m)
this would mean that the filename pattern based recognition would
work when you manually set the contentItem URI in the request to the Stanbol
enhancer e.g.
curl -X POST -H "Accept: text/turtle" -T test.docx \
http://dev.iks-project.eu:8080/enhancer/engine/tika?id=\
http://www.example.com/test.docx
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira