[mdb-users] fyi - Muldis D ref impl MREE - in memory dbs

Darren Duncan Sun, 27 Jun 2010 14:26:13 -0700

P.S. I originally sent a slightly different version of this message to the TTMmailing list on June 10th. I am now also sending it to you, for yourinformation. Feedback is optional but always welcome. I will also follow up toboth lists with a second, brief message on implementation shortly.

----------

Although I'm deferring actually writing it for about 2 more months because I
need to focus on a paying client, I have been thinking recently about design
matters for "Muldis Rosetta Example Engine" (MREE), which is to be the
self-contained reference implementation of my Muldis D language.


I have already long-since decided that MREE would use just "in-memory
databases", meaning that any database currently in use by the DBMS would exist
entirely in RAM on a single machine, as in-memory data structures of a single
operating-system process of the DBMS.  MREE would be full-featured in every way
(including being ACID) except for being size-constrained by how much RAM the
user has available, which at present would be on the order of anywhere from a
few tens of megabytes to a few gigabytes.

While MREE could conceivably support multiple on-disk formats, it would
canonically use just one, which is a serialization of the entire database as a
single Muldis D source code file, as defined by a "depot" node of the PTMD_STD
grammar.  When a database is just being used read-only, such as if it is just
actually a code library or program, this file can exist standalone in the file
system, and that file name is what is used to reference the database by users.
When a database is being used subject-to-update, then this file must instead
exist in a folder, and the folder name is what is referenced as the database by
users; during runtime, other temporary files would also be created by the DBMS
within that folder.  The folder format can also be used for read-only.  Either
way, this entire file or folder corresponds to the Muldis D concept of a
"depot", which is the widest scope over which a whole-database constraint may be
defined, or over which user-defined types and routines using them must live
together.  Loosely speaking, the folder format corresponds to a Postgres
database cluster, and the standalone file format corresponds to a SQLite
database file.

Keeping everything on disk in source code means that everything is user-readable
and writeable, and the same files can be used directly as input or output by
other implementations of Muldis D, analogously to SQL database dumps.  More
importantly, users just write Muldis D code as those text files, like in a
normal language, and it is immediately available for use by a DBMS.

A (possibly subject-to-update) folder-format-depot may only be used by a single
operating-system process at a time, but a (just read-only) file-format-depot may
be used by multiple OS processes at a time.  The same OS process may use
multiple depots of both kinds at a time.  If multiple concurrent OS processes
want to use the same database subject-to-update, then there must be a single
process which serves as a common intermediary, which is a database server (the
server is what would contain the whole database in RAM).  This is similar to
Postgres, and in contrast to SQLite, where multiple processes instead use the
database directly, mediated just by the likes of file-system semaphores.  Like
with Postgres, a folder-format-depot would contain a temporary PID file to track
which process stakes a claim on that database.  Generally speaking, no
file-system locks or mechanisms like SQLite uses are used by MREE, as it is
assumed only one process will use a database at once where any may update.

An OS process may or does subsume all folder-format-depots it has open under a
common transaction, meaning cross-database transactions.  When a transaction is
in progress, all such depots would each contain a temporary file naming all of
the depots (possibly 2 duplicate such files) so that all of the depots can be
recovered to a synchronized consistent state.

The in-RAM representation of a whole depot, as a data structure, as with all
transient data structures, is implemented as a functional language would, with
everything read-only, and as something analogous to singly-linked lists.
Multiple versions of the same database in RAM just store the unchanged portions
once, and for each the database structure is accessed via a pointer, and
automatic garbage collection takes care of portions that are no longer
needed/referenced.  This read-only MVCC all-in-RAM nature keeps a lot simpler
than otherwise, as a lot of complexity concerning managing locks is avoided, as
is a lot of complexity for mapping on-disk structures to in-memory ones.  Also,
the need for indexes would then be lessened, as any indexes would just be
produced in memory and only serve to speed up straight in-memory operations,
rather than reducing the number of disk accesses, and fewer indexes means more
actual data can fit in memory.

While the in-memory database design would incur a startup penalty, while reading
and parsing the on-disk serialized file, it is expected in general that the DBMS
would be a long-running process, and so most database access would be processed
entirely in memory, and so perform a lot faster than otherwise.  Durability is
achieved in the longer term with snapshots, where the whole in-memory database
is serialized to disk as a file such as it was read with, and in the shorter
term using transaction logging, where just the small amount of Muldis D code to
apply a delta to the current commit from its predecessor commit is saved to
disk, in a separate file.  The logging approach is done in the general case so
that the performance penalty of having full durability is minimized, and a
snapshot is done less often to effectively consolidate the log into the initial
database file, so to speed a later recovery or a DBMS restart.  Saving a
snapshot mainly involves writing out a new main file and then afterwards
deleting the old one.  The logs are just used in recovery; writing a
consolidation snapshot just writes everything from RAM and then deletes them.
One might consider the transaction logs to be WAL/write-ahead-logs, but in
practice for an in-memory database, it seems more practical to apply the changes
in memory first, since if it fails or is a no-op then the transaction is a no-op
anyway and the writing to disk can be avoided, and if the DBMS dies, it makes no
difference to recovery.

Now, a question I have for you on all this is whether it is likely that all
databases likely to be used in a demonstration or testing scenario, including
automated or stress testing, or databases used in greater than say 50% of all
production database use, are small enough to fit in RAM at once?  Especially the
complete database of any small business.  Or any well-known example databases.

One use that I'm positioning MREE for is to be a small dependency that
application developers can use instead of another DBMS while they are building,
testing, or demoing the application, which may normally be used with a much
larger data set than they would use in development or demoing.  So they have a
much more portable dev or demo environment.  Similarly, in particular, I'm
positioning MREE as a default DBMS to use by developers of database access
toolkits or ORMs, to use in the test suites of said tools, so that the tools can
be completely (as possible) tested in isolation from any large complex
dependencies that a typical DBMS would be.  Sort of like how SQLite can be used
except that MREE has all the features which SQLite lacks but big DBMSs have.

Any thoughts on anything here?

[]

-- Darren Duncan

_______________________________________________
muldis-db-users mailing list
muldis-db-users@mm.darrenduncan.net
http://mm.darrenduncan.net/mailman/listinfo/muldis-db-users

[mdb-users] fyi - Muldis D ref impl MREE - in memory dbs

Reply via email to