[nodejs] Re: In-memory database (inspired by 864GB of RAM for $12,000)

Juraj Vitko Thu, 02 Feb 2012 16:18:04 -0800

That's $13 per GB (ignoring the cost the host machine and operation
costs for simplicity). I'd say the first question is whether a data
stored in that GB of RAM can make you more than $13 (in reasonable
time), and from then on you can scale out. The next limit to look out
for is how much of RAM can you slot into a single node. After that,
it's about the throughput, latency and the price of cluster
interconnects, and your ability to deal with the utter ugliness of
data sharding. The next limit is how processable will this memory be -
i.e. how many hardware threads (and L1/2 caches and memory
controllers) you can unleash to work with this huge RAM and what the
cost of the thread synchronization will be. (You don't want to have a
2TB RAM machine capped on 1000 reqs/per sec.) But of course, we are
already talking about architecture with the potential to leave much
bigger clusters of classic solutions in a pile of dust, and enabling
novelty business models in the process (a big forum where you don't
have to wait a few seconds to post a message and another few seconds
to reload the page to see other's messages - a real-time, 100k user,
socket.io, acid pub/sub anyone? things like that).

But I think the most important point you are hinting at is to have a
primitive, but insanely fast, programmable and predictable database
residing in-process with Node.js (allocating data outside of the V8
heap of course). I believe this is where it's all going. If we look at
the area of database development, where is the main bottleneck? OK, I
see three.

First, we spend quite insane amount of money in form of CPU cycles
just to have the DB separated from the application server, out of
multitude of fears, the most prominent being not letting our "savage"
app coders on the holy DB code (eventually the same folks end up
working on app and DB anyway); to have the DB serve multiple web
servers, which historically are slow and can't handle many concurrent
users when not employing an evented I/O (yet nowadays the reality is
that a single Node.js will be waiting for your DB server, no matter
how optimized or cached); and to achieve better stability - in case
your web server goes down, your DB will stay up (historically, your
app server is a trash hardware, and your holy DB is a rugged,
expensive machine with a RAID controller with a battery backed cache,
etc., unlikely to ever go down -- well today we prefer to have
redundancy everywhere and do it on cheap).

Second, we try to keep the DB API simple and clean to keep things
fast, maintainable and reliable (yet what eventually always happens is
that we are forced to bring some form of scripting to the database,
e.g. SQL stored procs, Lua in Redis - because it simply makes a lot of
sense).

Third, DBs evolve slowly, the reason being that C/C++ compared to
other languages is a horrid environment for evolution of complex
software. The DB is not just storing data - it contains query logic,
persistence logic, cluster logic - lots of coding and lots of
potential for evolution. So why not to keep the core as simple and
fast as possible in C/C++, and throw the rest of the DB logic to the
environment where we can already observe *explosive* evolution of
existing frameworks/modules, which Node.js undoubtedly is? (Fun fact -
today I noticed: http://rubygems.org/gems - 7 years - 2202 packages;
http://search.npmjs.org/ - 2 years - 6862 packages -- and this is just
comparing Node.js with another language that is considered to be
*absolutely great* for development.)

I think interesting to mention here will be Redis with its Lua
scripting support, and the Alchemy Database, which is a SQL subset
built in Lua on top of the terse Redis API: 
http://code.google.com/p/alchemydatabase/
(I'm not advocating SQL or no-SQL here here at all.)

Another case in point appears to be the GlobalDB just mentioned in
this discussion and the Caché product build around it - but this is
the first time I see these, so pardon me if I'm way off.

So, what I think we need, is a Node.js native module with perhaps even
more limited/simpler API than Redis has, working in its own memory
segment, perhaps with is own memory allocator, and then we can all
have fun building our flavors of DB logic around that - clustering,
persistence, transactions - whatever - and we may start seeing some
really interesting things quite soon. Btw - if someone is intent on
having the Web/App server and DB separated, they can still do it - the
Node.js running the DB module can act as a standalone server too.

J.

On Jan 27, 7:23 pm, Alexey Petrushin <[email protected]>
wrote:
> Today I saw this note about 864GB of RAM for $12,000 and get a little
> stunned (http://37signals.com/svn/posts/3090-basecamp-nexts-caching-hardware).
>
> I mean - if the memory is so cheap now, maybe we can use in-memory
> databases for some situations? In average projects the size of database is
> about a couple of tens or hundreds of GB (not talking about analytics and
> other data-heavy apps).
>
> For example, let's suppose we can split (shard) our dataset into pieces ~
> 50Mb each (online organizers, tasks managers, sites, an so on, splitting
> data by account id).
>
> In this specific situation (set of small independent datasets) we can
> ignore most complexities of DB-technology:
> We don't care about consistency, availability, concurrency, MVCC, locking
> and throughput for the 50Mb (the only problem - long IO operations - but
> Node.JS is good with it).
>
> Another problem - indexes, but I believe it's easy. If it's easy to build
> an index with CouchDB using map/reduce with lots of limitations. Then it's
> should be even easier to build an index using arbitrary code and without
> any limitation.
>
> Persistence & fault-tolerance - save changes after every update to mysql
> and read the whole dataset in-memory after starting the app-process.
>
> Deployment - there are N node processes with M accounts in each and
> balancing proxy sending request to N-th node by consistent account-id hash.
>
> Tempted to try such architecture (in-memory, in-process, ad-hook DB for
> Node.JS) and would like to hear a critique about it (I mean use it only in
> cases when we can split our data into small independent pieces, I don't
> consider this approach outside of this area).
> So, what do You think?
>
> And, maybe someone already using this, or maybe there are some such
> database projects on github?
>
> Thanks

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

[nodejs] Re: In-memory database (inspired by 864GB of RAM for $12,000)

Reply via email to