OK, turns out that the DB in the paper (http://read.seas.harvard.edu/ ~kohler/pubs/mao12cache.pdf) actually is a tree one already (supports range queries while being faster than memcache, mongo, redis, etc.). I haven't read the paper fully, but I assume it will support ordered iteration as well. Good times, great day.
On Feb 22, 12:04 pm, Juraj Vitko <[email protected]> wrote: > Dobes, good that you brought up the Globals DB here - when compiling a > list of existing Node.js native addon DB implementations for the > nodejsdb.com page, I did not include the embedded Globals because > setting it up seemed not straightforward - e.g. you need a Linux > machine and do some setup steps outside of the `npm install` step. I > haven't used Globals yet and from their page I don't really understand > it, but I'm sure it would be good to have it as an in-process / > intrinsic option. > > Re: your reasoning about standalone DBs - that reasoning is of course > sound - but what I'm proposing here is finding a basic/primitive set > of DB building blocks, which, besides allowing implementation of > simplistic intrinsic/in-process DBs with minimal config and easy > deployment etc., also allowing implementation (and more importantly, > JS-like evolution) of standalone database products based on Node.js. > One can imagine this being similar to bolting V8 on MySQL, or Lua on > Redis, but from the other way - bolting a DB on Node, in the form of a > few function in a native addon, and implementing all the complex logic > in JS. (Perhaps later implementing proven concepts in github.com/d5/ > node.native). > > (On the nodejsdb.com page I'm listing the implementations I'm > currently aware of. Here I'm trying to get a random input from the > Node.js community. Also, by talking about this, I'm less likely to > shelve this into the "don't have time" bucket.) > > As I'm saying above, I think a lot could be done on top of a very > simple key-value + ordered-key-value + fast-list APIs. E.g. if someone > wanted to write an ACID SQL engine on top of it, they should be able > to. > > It just occurred to me today that ordered map could in fact be > (internally) implemented on top of a fast unordered key-value map, by > storing the red-black tree nodes into it as key-value pairs, where the > value is [left-key,right-key,data]. This could even make a lot of > sense on cache-optimized key-value implementations like the one I > spotted today:http://read.seas.harvard.edu/~kohler/pubs/mao12cache.pdf > > On Feb 22, 6:05 am, Dobes <[email protected]> wrote: > > > > > > > > > There was recently some discussion around the "globals" database which > > is a native node.js module providing a sorted key/value store you > > could build some other kinds of database on top of. > > > Berkeley > > DBhttp://www.oracle.com/technetwork/database/berkeleydb/overview/index.... > > has an embedded database that has been popular in the past. It hasn't > > been ported to a node.js module yet, but it could be a good > > candidate. It allows you to operate as a simple key/value store or as > > a transactional, distributed database. > > > In response to another poster, I'd be wary of database systems that > > just try to "transparently" persist objects you're changing because > > you're opening up a can of worms when it comes to concurrency control, > > unless you only allow single-process access to the database. > > > If you look to other programming languages and stack you can see a > > variety of in-process databases available, but they aren't as popular > > as databases that run separately and are accessed via an API. Off the > > top of my head I'd guess that is because: > > > 1. People want to run multiple applications that share the same > > database, or a cluster where there are more app server instances than > > database instances. In this case the database can operate more > > robustly in a single process handling things like caching, concurrency > > control, and so on. Even if you have only a single web server > > instance you often want to run background jobs and workers that need > > concurrent access. > > 2. When you start to split your application across multiple pieces of > > hardware (or just multiple virtual machines) you're going to have to > > introduce a TCP API of some sort anyway, so your in-process database > > has a limited lifespan anyway > > > I think there are a class of applications that do make good use of > > embedded databases - for example, desktop applications. When you have > > a single-tenant database running on just one computer, the embedded > > database makes sense. This is where you see embedded databases like > > Berkeley DB and SQLite being used quite commonly. Even applications > > people consider to be "document" applications like Microsoft Word are > > actually now using some kind of database engine under the hood. > > > If an embedded database is suitable for your future plans, it's > > probably best to wrap an existing battle-tested library like SQLLite > > and BerkeleyDB (there are a few others, too, whose names I cannot > > recall). Or use one that's already wrapped like GlobalsDB if it suits > > your needs. > > > Cheers, > > > Dobes > > > On Feb 21, 5:09 pm, Juraj Vitko <[email protected]> wrote: > > > >https://github.com/ypocat/nodejsdb(orhttp://nodejsdb.com) > > > > tl;dr - There are standalone database products (free or not), and > > > that's perfectly cool, but we already know how that works, so let's > > > try something different now. > > > > The general idea is to get Node.js and a data storage engine into a > > > tighter relationship, primarily to have more control of the data, but > > > also simpler stack, and even higher performance in accessing the data. > > > > I'm using the name "Intrinsic" because "In-process" is not exactly > > > accurate. E.g. there may be a shared-memory implementation shared by > > > multiple Nodes, or synchronized in-process implementation shared by > > > different Node Isolates (if these make it into Node), etc. > > > > I really like the base concept of Redis, because it provides simple, > > > reliable, predictable and fast primitive building blocks (in the form > > > of commands) which can support various app logic strategies, and it's > > > not hiding the complexities and overheads of storing and querying > > > data, that more complex DB's do. (So you are more likely to have more > > > stable production in the end, instead of fiascos with overflowing > > > shards etc.) > > > > This is also a vague follow-up to this discussion (in this > > > group)http://goo.gl/mDWqR-althoughI believe we should not insist only on > > > in-memory implementations at this time. > > > > As for the basic set of basic data structures and operations, that I > > > believe would support the above, I think we need: > > > > 1) fast unordered Hash Map (key, value) > > > (candidate:http://code.google.com/p/sparsehash/) > > > > 2) Ordered Map (with minimal empty 'value' overhead to allow for > > > Ordered Set implementation if someone wants it) > > > (candidate:http://www.cs.princeton.edu/~rs/talks/LLRB/08Penn.pdf) > > > > 3) a list that can be used for FIFO, LIFO, stack, etc. - probably > > > something close STL's Deque. (http://www.cplusplus.com/reference/stl/ > > > deque/) > > > > I think the API for the above should be as simple as possible, so that > > > we can have multiple implementations and various optimizations later, > > > while keeping the amount of needed work down. Also, terse API is > > > simple to use. > > > > From Node, we could do something like: > > > > require('a-nodejsdb-impl').open('/path/db', function(err, db) { > > > var users = db.map('users'); > > > var users_ordered_by_email = db.smap('users_by_email'); > > > users.on('put', function(k, v) { > > > users_ordered_by_email.put(v.email, k); > > > }); > > > users.put(1234, { fname: 'john', lname: 'smith', email: '[email protected]' }); > > > > } > > > > ..which implements a basic User table with primary key on ID and > > > ordered index on Email. (The difference being that it gives you 200k > > > operations per second and you don't need a separate DB server.) > > > > So if you guys have any constructive input regarding this, please post > > > it here. -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en
