On 6/20/2011 3:22 AM, Julian Leviston wrote:
On 20/06/2011, at 8:06 PM, BGB wrote:

hmm... S-Expression database?...
sort of like a hybrid between a database and a persistent store.


or such...

I'd like to know if you think there's a difference between a filesystem and a 
database... conceptually...

or such...

interesting thought...

well, filesystems are generally hierarchical, don't generally support user-defined fields, and store all data as large binary globs. some filesystems (such as HFS/HFS+ and NTFS) allow multiple forks per file, others don't (FAT32 / EXT* / ...).

generally, filesystems store data in terms of disk-blocks, and often make little/no provision for tiny files (partial exception being EXT* and ReiserFS, the former supporting inlining the contents of small files into the inode, the latter supporting the storage of sub-block files into B+Tree leaves).


databases may come in any number of varieties, and generally store structured or semi-structured information; the contents of a database are generally individual datums, rather than binary globs;
typically, databases store contents according to user-defined structures;
typically, databases are stored using structures such as B+Trees;
...


for example, the HDB used for my VM framework has a structure like:
each node is identified by a path, generally with a POSIX-style name "foo/bar/baz"; each node may contain a number of key/value pairs, with a special null/default key internally named '_', and the key is identified following ':', and sometimes key indices are used following a '.', ...
stored values are generally untyped, and treated as if they were strings.

"foo/bar/baz:name"
    node: "foo/bar/baz"
    key: "name"
"foo/bar/baz:item.3"
    node: "foo/bar/baz"
    key: "item"
    index: 3

the index allows treating the key like a small array.

possible, but not currently supported (at least not efficiently) would be to apply an index to a node, allowing treating the node as a makeshift indexed table:
"foo/bar/baz.59:name"

the optimization would be to transform the above into a key-index:
"foo/bar/baz:name.59", allowing for storing the values as arrays.

however, in this case, table-like nodes would not allow for indexed keys:
"foo/bar/baz.59:item.5"
since this would require double-indexing, but the keys are presently only defined as 1D arrays.

at present, this will work, as the node index would just store multiple nodes with dotted names ("baz.59" will simply be interpreted as a nodes' name).

as-is, node names may also begin with '.', however these nodes are treated as "volatile" and effectively will not be saved back into the database (this mechanism is generally used for storing "volatile" information, namely information about things like environment variables, which libraries or modules are currently loaded, ...).

so ".var/foo:name" is volatile, as neither ".var" nor any of its children will be saved to disk.


note:
the present system does not allow for non-trivial queries (besides "?next", "?prev", ...), but interestingly, for most uses of this system, one doesn't generally need to perform queries (much like, for the most part, application code doesn't really need "readdir" either, since its main use is mostly for presenting directory listings to the user).

it is worth noting that the design of this system was originally partly inspired by the Windows Registry, but there are differences (the nodes being untyped, and the use of '/' rather than '\', and the present lack of "hives" albeit a similar mechanism may be needed for other reasons, and may need to be retrofitted on, that or supporting in-database "mounts" and "links"...).

the external serialized database format somewhat resembles '.reg' files (albeit, technically if ".reg" and ".ini" files hybridized).

unlike more proper databases, the current implementation uses AVL trees rather than B+Trees internally. the main reason for AVL trees was that they were less effort to implement (actually, it is a tree of AVL trees, as each node-level starts a new AVL tree). the internal structure of the code for managing the HDB is a bit nasty, luckily, nearly all access is via named path-strings (the path-key then is its canonical structure, and all of the mucking around with AVL trees/... is hidden behind the API).

as I think noted previously, the main reason I chose an HDB style design (rather than an RDB / Relational Database) was because an HDB is generally much easier to work with via a programmatic interface, whereas working with relational databases is a bit more painful, as nearly every operation will involve queries and having to work with tables or similar...


or such...


_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to