Not that anyone cares or doesn't know, but…

Files systems seem simple (store blobs of arbitrary, opaque data--files--that have arbitrary text strings for names of limited length, and are usually organized in a hierarchy, plus a few other features) but they are complicated, are vital to be reliable, and hard to get right.

Databases care very much care about the data they store, care deeply about the "naming" of the data, usually offer complicated ways of organizing data, offer a more complicates set of features than do file systems, are vital to be reliable, and hard to get right.

They are different. A common case is to use them together. Store the metadata in a database (title, genre, date, running time, director, actors, screenwriter, MPEG path). And index most of that metadata to make it easy to search for things. But the actually fundamental data itself, the stuff we really care about (in this example MPEG data) is probably not going to be stored in the database, but will be opaque blobs---files--hashed into a directory path, stored in a file system.

Use the database for the stuff it is good at, use the file system for the stuff it is good at. Appreciate the difference.

Even if one gets into new "AI" stuff where the database might know much more about the data and can search on lots of fuzzy internal details, this extra knowledge is still just a kind of indexing, and the fundamental data will still be stored out in files.

In the case of e-mail there is naturally a bunch of structured metadata, and it is very suited to store in a database. There might be interesting indexing on the contents of e-mail, but the bodies of the messages (which might range from several bytes long to megabytes long and might be any kind of text-represented data) are really well suited to live in files. Maybe the e-mail system wants to digest various standards for attachments, but those are even more suited to be stored in files.

Different file systems will have different features, different databases will, too. In some cases the two will blur into each other, but they should be thought about differently as one chooses how to use one of another in any design.


Don't underestimate how hard a file system is to make. My wife's work uses Google web apps but they might switch to MS's competing products (horrors). One of the complaints is Google can't reliably store "files", they move around and get lost. Maybe Google stores the files themselves as blobs out in a file system, but all the metadata about the file, including the simulated "location" that is presented to the user, is being stored in a database. And it is hard to get that right. (Particularly in this world of continuous integration/continuous deployment, that worships feature velocity, is not designed before it is built, and fired the QA department.)


-kb, the Kent who will shut up now.

_______________________________________________
Discuss mailing list
[email protected]
https://lists.blu.org/mailman/listinfo/discuss

Reply via email to