[ 
https://issues.apache.org/jira/browse/STANBOL-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-809.
-----------------------------------------

    Resolution: Implemented
      Assignee: Rupert Westenthaler

implemented in trunk with http://svn.apache.org/viewvc?rev=1413551&view=rev
                
> Parse ConentItem URI to the Tika content type detector
> ------------------------------------------------------
>
>                 Key: STANBOL-809
>                 URL: https://issues.apache.org/jira/browse/STANBOL-809
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Engine - Tika
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>
> The content type detection could be improved by using the URI of the 
> processed content item as the Tika API allows to explicitly parse the file 
> name (or URI) of an resource as input parameter to the content type 
> detection. (see 
> https://tika.apache.org/1.2/detection.html#Resource_Name_Based_Detection)
>     Metadata m = new Metadata();
>     m.add(Metadata.RESOURCE_NAME_KEY,
>         contentItem.getUri().getUnicodeString());
>     detector.detect(is, m)
> this would mean that the filename pattern based recognition would
> work when you manually set the contentItem URI in the request to the Stanbol 
> enhancer e.g.
>      curl -X POST -H "Accept: text/turtle" -T test.docx \
>          http://dev.iks-project.eu:8080/enhancer/engine/tika?id=\
>          http://www.example.com/test.docx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to