I think that there are developers who are not familiar or
comfortable with SQL
and database management systems. This can lead to a number of bad
choices if
you are trying to develop a system which needs to store, select and
maintain
data. Also, using a DBMS never precludes the opportunity to create
an in
memory array of semi-static data, write semi-static html files, or
use any
other type of cache. Until traffic develops, it is hard to predict
where the
slow points will show up, but putting data into a key-value system
from the
beginning, when you need to provide for the usual insert/update/delete
functions, will probably lengthen development time.
Of course, if scalability is really the main problem, this should
imply that
going for a higher initial investment in equipment and development
would
ensure success. Scalability probably needs to include the
possibility of
scaling beyond one (reasonably sized) machine.
I'm not really sure this is an AOLServer topic, other than relating
to AOLServer scalability and the ns_bdb module (which works very well).
As to scalability in general, in the decade of web interface
programming, I've found three major areas of poor performance:
1) scripting overhead, esp as the application matures and more and
more dependent code gets included on each page load
2) SQL database speed, especially in high concurrency situations, but
also when millions-of-rows-per-table gets reached.
3) full text searching
As far as scripting overhead, my own benchmarks of "hello world" find
aolserver/tcl to be by far the fastest scripted web development
platform, at about 1/2 the speed of serving 1k GIFs, but about 10x
faster than PHP, 3x faster than lighthttpd-fastcgi. So, AOLserver is
a good platform as far as scripting scalability is concerned, as long
as the developer takes care not to load too much dependent code per
page.
SQL, in high concurrency situations, tends to not do well on the web,
especially in cases where lots of write/reads are occurring at the
end of the table, which is a common scenario. In my experience, many
applications that use SQL actually only need key-lookup capability
(ie, "your membership settings, your purchase history") and if the db
supports multiple-matching-keys, much can be done with that. What's
nice about Berkeley DB is that it runs in-process, with an aggressive
cache, so that during db reads it delivers performance close to
matching in-memory databases. Also, Berkeley DB can scale to running
on several synced databases, which is a highly unusual feature, and
if you've ever tried to make that work on Oracle or MS SQL, you know
that it's tricky to get right. Google uses Berkeley DB for their
universal login, for just this reason.
Thirdly, you'll notice many sites have very poor full text searching
performance. Lucene, a recently popularized full text search engine,
appears to finally solve this problem. However, in my case I wanted
both fast full text searching, and grouped-by-type search results,
such as Amazon returns. I used to run a directory web site in the
late-90s that ran about 300,000 full text search requests per day on
a 5 million page corpus, and I tried all the commercial and open
source solutions, and nothing could keep up (ie, deliver less-than-5-
seconds results under peak load). Back then, I wrote a full text
engine (simple inverted index) using AOLServer and dbm under Solaris/
Intel (running as an AOLServer C extension) and managed peak load
search times under 1/3rd of a second, so AOLserver + key-lookup-db
definitely could scale to very high levels.
I know that going with Berkeley DB is controversial, but in my
opinion it's extremely difficult to scale up a SQL backed application
that's failing in-the-field. That's generally why Oracle gets all
those dot-coms -- they promise that you'll scale if you go huge, if
you spend enough money. In the OSS world, I've put hundreds of hours
into trying to scale Postgresql and MySQL, and it's very difficult to
do on a completed product. My experience is that both of those OSS
dbs can scale to huge heights, but only if you're an expert in that
db platform and do extensive up-front design work to that effect,
which few people do.
Just my opinion... all I can say is that AOLServer+berkeley-db, if
you can live with a key-lookup database only, is incredibly fast, at
least 3x faster than anything else I've benchmarked.
-john
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.