[nodejs] Re: Intrinsic datastores for Node.js

Juraj Vitko Wed, 22 Feb 2012 02:04:38 -0800

Dobes, good that you brought up the Globals DB here - when compiling a
list of existing Node.js native addon DB implementations for the
nodejsdb.com page, I did not include the embedded Globals because
setting it up seemed not straightforward - e.g. you need a Linux
machine and do some setup steps outside of the `npm install` step. I
haven't used Globals yet and from their page I don't really understand
it, but I'm sure it would be good to have it as an in-process /
intrinsic option.


Re: your reasoning about standalone DBs - that reasoning is of course
sound - but what I'm proposing here is finding a basic/primitive set
of DB building blocks, which, besides allowing implementation of
simplistic intrinsic/in-process DBs with minimal config and easy
deployment etc., also allowing implementation (and more importantly,
JS-like evolution) of standalone database products based on Node.js.
One can imagine this being similar to bolting V8 on MySQL, or Lua on
Redis, but from the other way - bolting a DB on Node, in the form of a
few function in a native addon, and implementing all the complex logic
in JS. (Perhaps later implementing proven concepts in github.com/d5/
node.native).

(On the nodejsdb.com page I'm listing the implementations I'm
currently aware of. Here I'm trying to get a random input from the
Node.js community. Also, by talking about this, I'm less likely to
shelve this into the "don't have time" bucket.)

As I'm saying above, I think a lot could be done on top of a very
simple key-value + ordered-key-value + fast-list APIs. E.g. if someone
wanted to write an ACID SQL engine on top of it, they should be able
to.

It just occurred to me today that ordered map could in fact be
(internally) implemented on top of a fast unordered key-value map, by
storing the red-black tree nodes into it as key-value pairs, where the
value is [left-key,right-key,data]. This could even make a lot of
sense on cache-optimized key-value implementations like the one I
spotted today: http://read.seas.harvard.edu/~kohler/pubs/mao12cache.pdf


On Feb 22, 6:05 am, Dobes <[email protected]> wrote:
> There was recently some discussion around the "globals" database which
> is a native node.js module providing a sorted key/value store you
> could build some other kinds of database on top of.
>
> Berkeley 
> DBhttp://www.oracle.com/technetwork/database/berkeleydb/overview/index....
> has an embedded database that has been popular in the past.  It hasn't
> been ported to a node.js module yet, but it could be a good
> candidate.  It allows you to operate as a simple key/value store or as
> a transactional, distributed database.
>
> In response to another poster, I'd be wary of database systems that
> just try to "transparently" persist objects you're changing because
> you're opening up a can of worms when it comes to concurrency control,
> unless you only allow single-process access to the database.
>
> If you look to other programming languages and stack you can see a
> variety of in-process databases available, but they aren't as popular
> as databases that run separately and are accessed via an API.  Off the
> top of my head I'd guess that is because:
>
> 1. People want to run multiple applications that share the same
> database, or a cluster where there are more app server instances than
> database instances.  In this case the database can operate more
> robustly in a single process handling things like caching, concurrency
> control, and so on.  Even if you have only a single web server
> instance you often want to run background jobs and workers that need
> concurrent access.
> 2. When you start to split your application across multiple pieces of
> hardware (or just multiple virtual machines) you're going to have to
> introduce a TCP API of some sort anyway, so your in-process database
> has a limited lifespan anyway
>
> I think there are a class of applications that do make good use of
> embedded databases - for example, desktop applications.  When you have
> a single-tenant database running on just one computer, the embedded
> database makes sense.  This is where you see embedded databases like
> Berkeley DB and SQLite being used quite commonly.  Even applications
> people consider to be "document" applications like Microsoft Word are
> actually now using some kind of database engine under the hood.
>
> If an embedded database is suitable for your future plans, it's
> probably best to wrap an existing battle-tested library like SQLLite
> and BerkeleyDB (there are a few others, too, whose names I cannot
> recall).  Or use one that's already wrapped like GlobalsDB if it suits
> your needs.
>
> Cheers,
>
> Dobes
>
> On Feb 21, 5:09 pm, Juraj Vitko <[email protected]> wrote:
>
>
>
>
>
>
>
> >https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> > tl;dr - There are standalone database products (free or not), and
> > that's perfectly cool, but we already know how that works, so let's
> > try something different now.
>
> > The general idea is to get Node.js and a data storage engine into a
> > tighter relationship, primarily to have more control of the data, but
> > also simpler stack, and even higher performance in accessing the data.
>
> > I'm using the name "Intrinsic" because "In-process" is not exactly
> > accurate. E.g. there may be a shared-memory implementation shared by
> > multiple Nodes, or synchronized in-process implementation shared by
> > different Node Isolates (if these make it into Node), etc.
>
> > I really like the base concept of Redis, because it provides simple,
> > reliable, predictable and fast primitive building blocks (in the form
> > of commands) which can support various app logic strategies, and it's
> > not hiding the complexities and overheads of storing and querying
> > data, that more complex DB's do. (So you are more likely to have more
> > stable production in the end, instead of fiascos with overflowing
> > shards etc.)
>
> > This is also a vague follow-up to this discussion (in this 
> > group)http://goo.gl/mDWqR-although I believe we should not insist only on
> > in-memory implementations at this time.
>
> > As for the basic set of basic data structures and operations, that I
> > believe would support the above, I think we need:
>
> > 1) fast unordered Hash Map (key, value) 
> > (candidate:http://code.google.com/p/sparsehash/)
>
> > 2) Ordered Map (with minimal empty 'value' overhead to allow for
> > Ordered Set implementation if someone wants it) 
> > (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)
>
> > 3) a list that can be used for FIFO, LIFO, stack, etc. - probably
> > something close STL's Deque. (http://www.cplusplus.com/reference/stl/
> > deque/)
>
> > I think the API for the above should be as simple as possible, so that
> > we can have multiple implementations and various optimizations later,
> > while keeping the amount of needed work down. Also, terse API is
> > simple to use.
>
> > From Node, we could do something like:
>
> > require('a-nodejsdb-impl').open('/path/db', function(err, db) {
> >   var users = db.map('users');
> >   var users_ordered_by_email = db.smap('users_by_email');
> >   users.on('put', function(k, v) {
> >     users_ordered_by_email.put(v.email, k);
> >   });
> >   users.put(1234, { fname: 'john', lname: 'smith', email: '[email protected]' });
>
> > }
>
> > ..which implements a basic User table with primary key on ID and
> > ordered index on Email. (The difference being that it gives you 200k
> > operations per second and you don't need a separate DB server.)
>
> > So if you guys have any constructive input regarding this, please post
> > it here.

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

[nodejs] Re: Intrinsic datastores for Node.js

Reply via email to