That's $13 per GB (ignoring the cost the host machine and operation costs for simplicity). I'd say the first question is whether a data stored in that GB of RAM can make you more than $13 (in reasonable time), and from then on you can scale out. The next limit to look out for is how much of RAM can you slot into a single node. After that, it's about the throughput, latency and the price of cluster interconnects, and your ability to deal with the utter ugliness of data sharding. The next limit is how processable will this memory be - i.e. how many hardware threads (and L1/2 caches and memory controllers) you can unleash to work with this huge RAM and what the cost of the thread synchronization will be. (You don't want to have a 2TB RAM machine capped on 1000 reqs/per sec.) But of course, we are already talking about architecture with the potential to leave much bigger clusters of classic solutions in a pile of dust, and enabling novelty business models in the process (a big forum where you don't have to wait a few seconds to post a message and another few seconds to reload the page to see other's messages - a real-time, 100k user, socket.io, acid pub/sub anyone? things like that).
But I think the most important point you are hinting at is to have a primitive, but insanely fast, programmable and predictable database residing in-process with Node.js (allocating data outside of the V8 heap of course). I believe this is where it's all going. If we look at the area of database development, where is the main bottleneck? OK, I see three. First, we spend quite insane amount of money in form of CPU cycles just to have the DB separated from the application server, out of multitude of fears, the most prominent being not letting our "savage" app coders on the holy DB code (eventually the same folks end up working on app and DB anyway); to have the DB serve multiple web servers, which historically are slow and can't handle many concurrent users when not employing an evented I/O (yet nowadays the reality is that a single Node.js will be waiting for your DB server, no matter how optimized or cached); and to achieve better stability - in case your web server goes down, your DB will stay up (historically, your app server is a trash hardware, and your holy DB is a rugged, expensive machine with a RAID controller with a battery backed cache, etc., unlikely to ever go down -- well today we prefer to have redundancy everywhere and do it on cheap). Second, we try to keep the DB API simple and clean to keep things fast, maintainable and reliable (yet what eventually always happens is that we are forced to bring some form of scripting to the database, e.g. SQL stored procs, Lua in Redis - because it simply makes a lot of sense). Third, DBs evolve slowly, the reason being that C/C++ compared to other languages is a horrid environment for evolution of complex software. The DB is not just storing data - it contains query logic, persistence logic, cluster logic - lots of coding and lots of potential for evolution. So why not to keep the core as simple and fast as possible in C/C++, and throw the rest of the DB logic to the environment where we can already observe *explosive* evolution of existing frameworks/modules, which Node.js undoubtedly is? (Fun fact - today I noticed: http://rubygems.org/gems - 7 years - 2202 packages; http://search.npmjs.org/ - 2 years - 6862 packages -- and this is just comparing Node.js with another language that is considered to be *absolutely great* for development.) I think interesting to mention here will be Redis with its Lua scripting support, and the Alchemy Database, which is a SQL subset built in Lua on top of the terse Redis API: http://code.google.com/p/alchemydatabase/ (I'm not advocating SQL or no-SQL here here at all.) Another case in point appears to be the GlobalDB just mentioned in this discussion and the Caché product build around it - but this is the first time I see these, so pardon me if I'm way off. So, what I think we need, is a Node.js native module with perhaps even more limited/simpler API than Redis has, working in its own memory segment, perhaps with is own memory allocator, and then we can all have fun building our flavors of DB logic around that - clustering, persistence, transactions - whatever - and we may start seeing some really interesting things quite soon. Btw - if someone is intent on having the Web/App server and DB separated, they can still do it - the Node.js running the DB module can act as a standalone server too. J. On Jan 27, 7:23 pm, Alexey Petrushin <[email protected]> wrote: > Today I saw this note about 864GB of RAM for $12,000 and get a little > stunned (http://37signals.com/svn/posts/3090-basecamp-nexts-caching-hardware). > > I mean - if the memory is so cheap now, maybe we can use in-memory > databases for some situations? In average projects the size of database is > about a couple of tens or hundreds of GB (not talking about analytics and > other data-heavy apps). > > For example, let's suppose we can split (shard) our dataset into pieces ~ > 50Mb each (online organizers, tasks managers, sites, an so on, splitting > data by account id). > > In this specific situation (set of small independent datasets) we can > ignore most complexities of DB-technology: > We don't care about consistency, availability, concurrency, MVCC, locking > and throughput for the 50Mb (the only problem - long IO operations - but > Node.JS is good with it). > > Another problem - indexes, but I believe it's easy. If it's easy to build > an index with CouchDB using map/reduce with lots of limitations. Then it's > should be even easier to build an index using arbitrary code and without > any limitation. > > Persistence & fault-tolerance - save changes after every update to mysql > and read the whole dataset in-memory after starting the app-process. > > Deployment - there are N node processes with M accounts in each and > balancing proxy sending request to N-th node by consistent account-id hash. > > Tempted to try such architecture (in-memory, in-process, ad-hook DB for > Node.JS) and would like to hear a critique about it (I mean use it only in > cases when we can split our data into small independent pieces, I don't > consider this approach outside of this area). > So, what do You think? > > And, maybe someone already using this, or maybe there are some such > database projects on github? > > Thanks -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en
