[nodejs] Re: Intrinsic datastores for Node.js

Juraj Vitko Wed, 22 Feb 2012 02:53:49 -0800

OK, turns out that the DB in the paper (http://read.seas.harvard.edu/
~kohler/pubs/mao12cache.pdf) actually is a tree one already (supports
range queries while being faster than memcache, mongo, redis, etc.). I
haven't read the paper fully, but I assume it will support ordered
iteration as well. Good times, great day.


On Feb 22, 12:04 pm, Juraj Vitko <[email protected]> wrote:
> Dobes, good that you brought up the Globals DB here - when compiling a
> list of existing Node.js native addon DB implementations for the
> nodejsdb.com page, I did not include the embedded Globals because
> setting it up seemed not straightforward - e.g. you need a Linux
> machine and do some setup steps outside of the `npm install` step. I
> haven't used Globals yet and from their page I don't really understand
> it, but I'm sure it would be good to have it as an in-process /
> intrinsic option.
>
> Re: your reasoning about standalone DBs - that reasoning is of course
> sound - but what I'm proposing here is finding a basic/primitive set
> of DB building blocks, which, besides allowing implementation of
> simplistic intrinsic/in-process DBs with minimal config and easy
> deployment etc., also allowing implementation (and more importantly,
> JS-like evolution) of standalone database products based on Node.js.
> One can imagine this being similar to bolting V8 on MySQL, or Lua on
> Redis, but from the other way - bolting a DB on Node, in the form of a
> few function in a native addon, and implementing all the complex logic
> in JS. (Perhaps later implementing proven concepts in github.com/d5/
> node.native).
>
> (On the nodejsdb.com page I'm listing the implementations I'm
> currently aware of. Here I'm trying to get a random input from the
> Node.js community. Also, by talking about this, I'm less likely to
> shelve this into the "don't have time" bucket.)
>
> As I'm saying above, I think a lot could be done on top of a very
> simple key-value + ordered-key-value + fast-list APIs. E.g. if someone
> wanted to write an ACID SQL engine on top of it, they should be able
> to.
>
> It just occurred to me today that ordered map could in fact be
> (internally) implemented on top of a fast unordered key-value map, by
> storing the red-black tree nodes into it as key-value pairs, where the
> value is [left-key,right-key,data]. This could even make a lot of
> sense on cache-optimized key-value implementations like the one I
> spotted today:http://read.seas.harvard.edu/~kohler/pubs/mao12cache.pdf
>
> On Feb 22, 6:05 am, Dobes <[email protected]> wrote:
>
>
>
>
>
>
>
> > There was recently some discussion around the "globals" database which
> > is a native node.js module providing a sorted key/value store you
> > could build some other kinds of database on top of.
>
> > Berkeley 
> > DBhttp://www.oracle.com/technetwork/database/berkeleydb/overview/index....
> > has an embedded database that has been popular in the past.  It hasn't
> > been ported to a node.js module yet, but it could be a good
> > candidate.  It allows you to operate as a simple key/value store or as
> > a transactional, distributed database.
>
> > In response to another poster, I'd be wary of database systems that
> > just try to "transparently" persist objects you're changing because
> > you're opening up a can of worms when it comes to concurrency control,
> > unless you only allow single-process access to the database.
>
> > If you look to other programming languages and stack you can see a
> > variety of in-process databases available, but they aren't as popular
> > as databases that run separately and are accessed via an API.  Off the
> > top of my head I'd guess that is because:
>
> > 1. People want to run multiple applications that share the same
> > database, or a cluster where there are more app server instances than
> > database instances.  In this case the database can operate more
> > robustly in a single process handling things like caching, concurrency
> > control, and so on.  Even if you have only a single web server
> > instance you often want to run background jobs and workers that need
> > concurrent access.
> > 2. When you start to split your application across multiple pieces of
> > hardware (or just multiple virtual machines) you're going to have to
> > introduce a TCP API of some sort anyway, so your in-process database
> > has a limited lifespan anyway
>
> > I think there are a class of applications that do make good use of
> > embedded databases - for example, desktop applications.  When you have
> > a single-tenant database running on just one computer, the embedded
> > database makes sense.  This is where you see embedded databases like
> > Berkeley DB and SQLite being used quite commonly.  Even applications
> > people consider to be "document" applications like Microsoft Word are
> > actually now using some kind of database engine under the hood.
>
> > If an embedded database is suitable for your future plans, it's
> > probably best to wrap an existing battle-tested library like SQLLite
> > and BerkeleyDB (there are a few others, too, whose names I cannot
> > recall).  Or use one that's already wrapped like GlobalsDB if it suits
> > your needs.
>
> > Cheers,
>
> > Dobes
>
> > On Feb 21, 5:09 pm, Juraj Vitko <[email protected]> wrote:
>
> > >https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com)
>
> > > tl;dr - There are standalone database products (free or not), and
> > > that's perfectly cool, but we already know how that works, so let's
> > > try something different now.
>
> > > The general idea is to get Node.js and a data storage engine into a
> > > tighter relationship, primarily to have more control of the data, but
> > > also simpler stack, and even higher performance in accessing the data.
>
> > > I'm using the name "Intrinsic" because "In-process" is not exactly
> > > accurate. E.g. there may be a shared-memory implementation shared by
> > > multiple Nodes, or synchronized in-process implementation shared by
> > > different Node Isolates (if these make it into Node), etc.
>
> > > I really like the base concept of Redis, because it provides simple,
> > > reliable, predictable and fast primitive building blocks (in the form
> > > of commands) which can support various app logic strategies, and it's
> > > not hiding the complexities and overheads of storing and querying
> > > data, that more complex DB's do. (So you are more likely to have more
> > > stable production in the end, instead of fiascos with overflowing
> > > shards etc.)
>
> > > This is also a vague follow-up to this discussion (in this 
> > > group)http://goo.gl/mDWqR-althoughI believe we should not insist only on
> > > in-memory implementations at this time.
>
> > > As for the basic set of basic data structures and operations, that I
> > > believe would support the above, I think we need:
>
> > > 1) fast unordered Hash Map (key, value) 
> > > (candidate:http://code.google.com/p/sparsehash/)
>
> > > 2) Ordered Map (with minimal empty 'value' overhead to allow for
> > > Ordered Set implementation if someone wants it) 
> > > (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf)
>
> > > 3) a list that can be used for FIFO, LIFO, stack, etc. - probably
> > > something close STL's Deque. (http://www.cplusplus.com/reference/stl/
> > > deque/)
>
> > > I think the API for the above should be as simple as possible, so that
> > > we can have multiple implementations and various optimizations later,
> > > while keeping the amount of needed work down. Also, terse API is
> > > simple to use.
>
> > > From Node, we could do something like:
>
> > > require('a-nodejsdb-impl').open('/path/db', function(err, db) {
> > >   var users = db.map('users');
> > >   var users_ordered_by_email = db.smap('users_by_email');
> > >   users.on('put', function(k, v) {
> > >     users_ordered_by_email.put(v.email, k);
> > >   });
> > >   users.put(1234, { fname: 'john', lname: 'smith', email: '[email protected]' });
>
> > > }
>
> > > ..which implements a basic User table with primary key on ID and
> > > ordered index on Email. (The difference being that it gives you 200k
> > > operations per second and you don't need a separate DB server.)
>
> > > So if you guys have any constructive input regarding this, please post
> > > it here.

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

[nodejs] Re: Intrinsic datastores for Node.js

Reply via email to