I have read all about GAE, watched the interesting Google I/O videos, and written some simple (toy) applications. Now, I would really like to be able to demonstrate to myself, with a simple toy application, that GAE can out-scale what a single dedicated host can do.
When I set off on this experiment, I was expecting it to be easy to demonstrate that GAE can handle some request pattern which my PC (temporarily running as a LAMP [with mod_python] box) would grind to a halt on. Perhaps my results are just an artifact of the current quotas, but more likely my test applications are stressing scalability in the directions GAE scales. I'll describe my current approach and let people here help point in the right direction (hopefully). I use a program like htttperf to spawn requests from a cluster of (~20) machines. I specify a page to request, the target request per second rate, and the duration of the test. I also specify a time period over which the requests per second rate grows linearly from 0 to the target. I'm currently trying parameters in the range of 10-20 reqs/ sec for a duration of a few minutes after a ramp-up of a few minutes. I have tried this with three separate toy applications: 1) A python script which sleeps, say for 0.1sec, then returns a trivial page. This was a silly idea b/c the PC simply spawns tons of processes which are doing nothing and easily keeps up with even quite high reqs per sec. GAE dies much sooner since sleeping seems to count against your CPU time. 2) A python script which generates a random number through an process which is intentionally slow -- so that it takes about 0.1sec on both my PC and GAE for an individual request. At >20 reqs/sec, it seems like the (dual-corE) PC should die as requests come in faster than it can handle them while GAE should be able to scale -- but GAE ends up going over quota. Maybe I need to revisit the ramping up process and make sure it is really working like I think? 3) Use a DB with string-integer pairs. The strings are an average of 50B. Have a python script which queries for the first N records greater than some random string. Then return the sum of some of the integers from these N records (randomly chosen). The PC is running MySQL and does the same select, and does the summation by iterating over the N records like the GAE app to be "fair" (SQL could probably be even faster by doing the aggregation in the query rather than in python). With N=20 and the DB having about 300k records, GAE unfortunately times out while the PC version is almost instantaneous (for individual requests). What ideas do you have for demonstrating that GAE can scale better than PC, at least under certain conditions? I have no doubt that I'm missing something. For now, I'm going to focus on the ramp-up portion and make sure that's working correctly, though I imagine there are probably better approaches than the above three for trying to demonstrate this and I'd love to hear your thoughts. :) Thanks. ~ David --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
