On Saturday, 16 April 2016 at 19:34:52 UTC, Eugene Wissner wrote:
Wow. I just wanted to port libmagic since need it. Can you
write a short introduction how I can work with the magic
database (defining mime type of a file based on its content)?
Usually mime type detection is done by parsing mime.cache files.
These are binary files that can be mapped into memory. mime.cache
files are generated by update-mime-database using source packages
as base (these are in XML format, see
Here's format spec:
Code of 'mime' library responsible for parsing such files:
mime.cache file has MagicList entry that store magic rules for
MagicList consists of Match entries sorted by priority. Match
includes name of mime type it's related to and has Matchlet
entries as children which on their own may have other Matchlets
as children (so it's a tree). Each Matchlet describes part of
magic rule including content to match and position in file where
this content should be found to say that the file is of this
type. This information is also stored in separate 'magic' file.
Options are described in spec:
Matchlets have OR logic so if any tree path matches file
contents, then this file is of type in this Match.
For better demonstrating of recursive nature of rules see
definition of application/x-executable or application/x-sharedlib
in /usr/share/mime/packages/freedesktop.org.xml. Here <magic>
element coincides with Match entry in mime.cache and <match>
elements coincide with Matchlet entries.
So the algorithm is:
1. Iterate over Match entries in MagicList
2. For every Match iterate over every Matchlet.
3. Recursively apply Matchlet rule and its children rules to file
4. If some tree path matches file contents the mime type for this
file is found (you don't need to check following Match entries,
since they have less or the same priority). Otherwise go to the
next Match in MagicList.
See source code in 'mime' library responsible for this task:
Note that I did not describe how to define mime type when
there're more than one mime.cache file and how to handle
conflicts and explicitly deleted magic rules. Here's source code
Please read the spec.