[ 
https://issues.apache.org/jira/browse/TIKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681353#comment-13681353
 ] 

Nick Burch commented on TIKA-1120:
----------------------------------

The latest detection documentation is at 
<https://tika.apache.org/1.3/detection.html> - the URL you referenced is for an 
older version of Tika

I don't think people probably should be doing the things in your code... You 
should really be going to a TikaConfig object 
<http://tika.apache.org/1.3/api/org/apache/tika/config/TikaConfig.html>, and 
either getting a Detector from that, or the mime types registry. 

Are you able to suggest some tweaks to the most recent documentation that would 
make this clearer for someone in your situation?
                
> Enable direct use of org.apache.tika.mime.MediaType.detect(...)
> ---------------------------------------------------------------
>
>                 Key: TIKA-1120
>                 URL: https://issues.apache.org/jira/browse/TIKA-1120
>             Project: Tika
>          Issue Type: Wish
>          Components: mime
>    Affects Versions: 1.3
>            Reporter: Oliver Kopp
>            Priority: Minor
>
> When using mime type detection, the classes allow following use:
>     try (InputStream is = theInputStream;
>          BufferedInputStream bis = new BufferedInputStream(is);) {
>         MimeTypes mt = new MimeTypes();
>         Metadata md = new Metadata();
>         md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
>         MediaType mediaType = mt.detect(bis, null);
>         return mediaType.toString();
>     }
> When debugging this, the MimeTypes class instantiates its internal patterns 
> with  an empty MediaTypeRegistry. Therefore, getDefaultMimeTypes() is never 
> called and thus tika-mimetypes.xml never read.
> Is it possible to enable direct usage of MediaType.detect()? Like adding a 
> new constructor, where the MediaTypeRegistry can be set? 
> If not, the code comments (or the documentation at 
> https://tika.apache.org/0.10/detection.html) should point out that 
> MimeTypes() should not instantiated directly for mime type detection, but the 
> detectors should be used. Possibly, a minimum example should be added to make 
> the usage clear.
> Following example works here
>     try (InputStream is = theInputStream;
>             BufferedInputStream bis = new BufferedInputStream(is);) {
>         AutoDetectParser parser = new AutoDetectParser();
>         Detector detector = parser.getDetector();
>         Metadata md = new Metadata();
>         md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
>         MediaType mediaType = detector.detect(bis, md);
>         return mediaType.toString();
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to