Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "BaysianMimeTypeSelector" page has been changed by Lukeliush:
https://wiki.apache.org/tika/BaysianMimeTypeSelector

New page:
Describe BaysianMimeTypeSelector here.

[[https://issues.apache.org/jira/browse/TIKA-1517|TIKA-1517 [MIME type 
selection with probability]]]

The motivation is that the current implemenation within MimeTypes for detecting 
mime types in Tika is a bit stiff and less flexible(at the time the article is 
being written, the current version of MimeTypes which has 3 detection 
approaches to identify mime types is implemented with a fall-back strategy), 
the detection highly depends on the magic byte detection and the last two 
approaches (i.e. extension and metatdatahint matching) are subsidiary and 
auxiliary in the final decsion. In other words, the decision that comes from 
the last two approach will not only be considered when there is a tie to break 
in the magic bytes detection as there might be multiple mime types estimated by 
magic bytes method, in this situation extension and metadatahint will be used. 
It is possible that in some situation the type given by extension and metadata 
hint matching are more specialized than magic bytes method, then the most 
specialized or specific type gets returned. This implementation seems to exhibt 
a bit inflexibilities when users want to




Reply via email to