On 6/20/2011 3:22 AM, Julian Leviston wrote:
On 20/06/2011, at 8:06 PM, BGB wrote:
hmm... S-Expression database?...
sort of like a hybrid between a database and a persistent store.
or such...
I'd like to know if you think there's a difference between a filesystem and a
database... conceptually...
or such...
interesting thought...
well, filesystems are generally hierarchical, don't generally support
user-defined fields, and store all data as large binary globs. some
filesystems (such as HFS/HFS+ and NTFS) allow multiple forks per file,
others don't (FAT32 / EXT* / ...).
generally, filesystems store data in terms of disk-blocks, and often
make little/no provision for tiny files (partial exception being EXT*
and ReiserFS, the former supporting inlining the contents of small files
into the inode, the latter supporting the storage of sub-block files
into B+Tree leaves).
databases may come in any number of varieties, and generally store
structured or semi-structured information;
the contents of a database are generally individual datums, rather than
binary globs;
typically, databases store contents according to user-defined structures;
typically, databases are stored using structures such as B+Trees;
...
for example, the HDB used for my VM framework has a structure like:
each node is identified by a path, generally with a POSIX-style name
"foo/bar/baz";
each node may contain a number of key/value pairs, with a special
null/default key internally named '_', and the key is identified
following ':', and sometimes key indices are used following a '.', ...
stored values are generally untyped, and treated as if they were strings.
"foo/bar/baz:name"
node: "foo/bar/baz"
key: "name"
"foo/bar/baz:item.3"
node: "foo/bar/baz"
key: "item"
index: 3
the index allows treating the key like a small array.
possible, but not currently supported (at least not efficiently) would
be to apply an index to a node, allowing treating the node as a
makeshift indexed table:
"foo/bar/baz.59:name"
the optimization would be to transform the above into a key-index:
"foo/bar/baz:name.59", allowing for storing the values as arrays.
however, in this case, table-like nodes would not allow for indexed keys:
"foo/bar/baz.59:item.5"
since this would require double-indexing, but the keys are presently
only defined as 1D arrays.
at present, this will work, as the node index would just store multiple
nodes with dotted names ("baz.59" will simply be interpreted as a nodes'
name).
as-is, node names may also begin with '.', however these nodes are
treated as "volatile" and effectively will not be saved back into the
database (this mechanism is generally used for storing "volatile"
information, namely information about things like environment variables,
which libraries or modules are currently loaded, ...).
so ".var/foo:name" is volatile, as neither ".var" nor any of its
children will be saved to disk.
note:
the present system does not allow for non-trivial queries (besides
"?next", "?prev", ...), but interestingly, for most uses of this system,
one doesn't generally need to perform queries (much like, for the most
part, application code doesn't really need "readdir" either, since its
main use is mostly for presenting directory listings to the user).
it is worth noting that the design of this system was originally partly
inspired by the Windows Registry, but there are differences (the nodes
being untyped, and the use of '/' rather than '\', and the present lack
of "hives" albeit a similar mechanism may be needed for other reasons,
and may need to be retrofitted on, that or supporting in-database
"mounts" and "links"...).
the external serialized database format somewhat resembles '.reg' files
(albeit, technically if ".reg" and ".ini" files hybridized).
unlike more proper databases, the current implementation uses AVL trees
rather than B+Trees internally. the main reason for AVL trees was that
they were less effort to implement (actually, it is a tree of AVL trees,
as each node-level starts a new AVL tree).
the internal structure of the code for managing the HDB is a bit nasty,
luckily, nearly all access is via named path-strings (the path-key then
is its canonical structure, and all of the mucking around with AVL
trees/... is hidden behind the API).
as I think noted previously, the main reason I chose an HDB style design
(rather than an RDB / Relational Database) was because an HDB is
generally much easier to work with via a programmatic interface, whereas
working with relational databases is a bit more painful, as nearly every
operation will involve queries and having to work with tables or similar...
or such...
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc