I have read all about GAE, watched the interesting Google I/O videos,
and written some simple (toy) applications.  Now, I would really like
to be able to demonstrate to myself, with a simple toy application,
that GAE can out-scale what a single dedicated host can do.

When I set off on this experiment, I was expecting it to be easy to
demonstrate that GAE can handle some request pattern which my PC
(temporarily running as a LAMP [with mod_python] box) would grind to a
halt on.

Perhaps my results are just an artifact of the current quotas, but
more likely my test applications are stressing scalability in the
directions GAE scales.  I'll describe my current approach and let
people here help point in the right direction (hopefully).  I use a
program like htttperf to spawn requests from a cluster of (~20)
machines.  I specify a page to request, the target request per second
rate, and the duration of the test.  I also specify a time period over
which the requests per second rate grows linearly from 0 to the
target.  I'm currently trying parameters in the range of 10-20 reqs/
sec for a duration of a few minutes after a ramp-up of a few minutes.

I have tried this with three separate toy applications:
   1) A python script which sleeps, say for 0.1sec, then returns a
trivial page.  This was a silly idea b/c the PC simply spawns tons of
processes which are doing nothing and easily keeps up with even quite
high reqs per sec.  GAE dies much sooner since sleeping seems to count
against your CPU time.

   2) A python script which generates a random number through an
process which is intentionally slow -- so that it takes about 0.1sec
on both my PC and GAE for an individual request.  At >20 reqs/sec, it
seems like the (dual-corE) PC should die as requests come in faster
than it can handle them while GAE should be able to scale -- but GAE
ends up going over quota.  Maybe I need to revisit the ramping up
process and make sure it is really working like I think?

   3) Use a DB with string-integer pairs.  The strings are an average
of 50B.  Have a python script which queries for the first N records
greater than some random string.  Then return the sum of some of the
integers from these N records (randomly chosen).  The PC is running
MySQL and does the same select, and does the summation by iterating
over the N records like the GAE app to be "fair" (SQL could probably
be even faster by doing the aggregation in the query rather than in
python).  With N=20 and the DB having about 300k records, GAE
unfortunately times out while the PC version is almost instantaneous
(for individual requests).


What ideas do you have for demonstrating that GAE can scale better
than PC, at least under certain conditions?  I have no doubt that I'm
missing something.  For now, I'm going to focus on the ramp-up portion
and make sure that's working correctly, though I imagine there are
probably better approaches than the above three for trying to
demonstrate this and I'd love to hear your thoughts.

:)  Thanks.

~ David
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to