On Saturday, 16 April 2016 at 19:34:52 UTC, Eugene Wissner wrote:


Wow. I just wanted to port libmagic since need it. Can you write a short introduction how I can work with the magic database (defining mime type of a file based on its content)?

Usually mime type detection is done by parsing mime.cache files. These are binary files that can be mapped into memory. mime.cache files are generated by update-mime-database using source packages as base (these are in XML format, see https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.18.html#idm140001680036896 )

Here's format spec: https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.18.html#idm140001675194688

Code of 'mime' library responsible for parsing such files: https://github.com/MyLittleRobo/mime/blob/master/source/mime/cache.d

mime.cache file has MagicList entry that store magic rules for all types. MagicList consists of Match entries sorted by priority. Match includes name of mime type it's related to and has Matchlet entries as children which on their own may have other Matchlets as children (so it's a tree). Each Matchlet describes part of magic rule including content to match and position in file where this content should be found to say that the file is of this type. This information is also stored in separate 'magic' file. Options are described in spec: https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.18.html#idm140001675229440

Matchlets have OR logic so if any tree path matches file contents, then this file is of type in this Match.

For better demonstrating of recursive nature of rules see definition of application/x-executable or application/x-sharedlib in /usr/share/mime/packages/freedesktop.org.xml. Here <magic> element coincides with Match entry in mime.cache and <match> elements coincide with Matchlet entries.

So the algorithm is:

1. Iterate over Match entries in MagicList
2. For every Match iterate over every Matchlet.
3. Recursively apply Matchlet rule and its children rules to file content. 4. If some tree path matches file contents the mime type for this file is found (you don't need to check following Match entries, since they have less or the same priority). Otherwise go to the next Match in MagicList.

See source code in 'mime' library responsible for this task: https://github.com/MyLittleRobo/mime/blob/master/source/mime/cache.d#L463

Note that I did not describe how to define mime type when there're more than one mime.cache file and how to handle conflicts and explicitly deleted magic rules. Here's source code though: https://github.com/MyLittleRobo/mime/blob/master/source/mime/detectors/cache.d#L210

Please read the spec.

Reply via email to