[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-18 Thread lateef jackson
I guess my confusion was from my original post where I said:* Application server scale easy as pie, database servers scale like  hernias I should have been more specific that applications server what I meant was TG, PHP where ever the business logic of the code is. Maybe easy as pie was the wrong analogy to use. 
The m/custer technology is what I usually call database proxy. CJDBC was the first time I thought about using a db proxy. This is a really great solution for failover but not for scalability. If I am doing more database transaction persecond I can't just add another database server, I have to buy a bigger database server and if I am using a database proxy and have 3 database server that means I have to buy 3 database servers. You are technically correct database writes don't end up in exactly one place but those write are copied not split up. So now you have every write ending up on every single database node, which only scales if you writes stays the same and your reads increases. All the applications I have worked on reads and write increase as load increases, sometimes you can have a spike in the reads and not the writes it just depends on the applications.
To write in more than one place you have to have the technology to be to load balance the writes. Then reads will need to know what server to go to for what data. I think this would be a blast to code! Fortunatly reads are a majority of most applications especially web applications. Thus why some simple caching is easier and more effective than replication of data.
On 3/17/06, Robin Haswell [EMAIL PROTECTED] wrote:
 my time has a cost and optimisation often buys less performance than, say, a Dell SC1425 Unfortunatly my time is not worth a IBM 64way mainframe (or I would be one happy hacker). Bigger machines help but as my comment said before
 this will give you only linear optimization at some point you will need _exponential_ optmizaitions. This also depends on the complexity of the data relationships that your application needs. You need a machine that
 is 64 times faster buyNah mate you miss my point! Not bigger machines, *more* machines. A DellSC1425 is a pretty low-end piece of kit, the idea is you use multiplemachines.Let's say you have an application that is currently running at 100%
above acceptable capacity. You can solve this problem in basically fourways:1. Buy hardware that is twice as powerful2. Perform optimisation, caching - etc.3. A combination of the 1) and 2)4. Buy another similar server and run them both
In my experience, 4) is always the cheapest option, and requires lesshassle than 2) and 3) (and less hassle is the TG way!). The trick is tomake option 4 possible by asking questions like What will happen if I
use two app or database servers - or both early on in the buildprocess. I do this for everything and it's served me right so far :-)Part of my personal PHP standard library is some wrappers around session
management and database handling that means:1) All my session data is stored in the database, which means from thenon I can implement *all* my persistent storage in an RDBMS.2) My database reads and my database writes are separated and
controllable, so if we need to add replication it's possible to directall writes to the master server and balance reads between the slaves.(Yes I said there are alternatives to the master/slave setup, but in web
apps which are mostly read-heavy it's a pretty good solution anyway).-RobPS. If you're interested in the writing to one place problem, youshould look m/custer(
http://www.continuent.com/index.php?option=com_contenttask=viewid=211Itemid=168).We have our own solution, but in general it's a pretty awesome setup fordatabase scaling through multiple servers.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups TurboGears group.  To post to this group, send email to turbogears@googlegroups.com  To unsubscribe from this group, send email to [EMAIL PROTECTED]  For more options, visit this group at http://groups.google.com/group/turbogears  -~--~~~~--~~--~--~---


[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-18 Thread lateef jackson
LiveJournal is an excelent example it is where I stole the idea of using memcache: http://www.linuxjournal.com/article/7451My visions for scaling up is to us url partitioning so that simular data all goes to the same host(s) and have no shared cache. For example if you have a news site all the tech articles could go to one server(s) that had all tech articles in there cache. This makes the invalidation have to iterate over a list of hosts but thems the breaks.
On 3/17/06, Bob Ippolito [EMAIL PROTECTED] wrote:
On Mar 17, 2006, at 3:17 PM, Robin Haswell wrote: my time has a cost and optimisation often buys less performance than, say, a Dell SC1425 Unfortunatly my time is not worth a IBM 64way mainframe (or I
 would be one happy hacker). Bigger machines help but as my comment said before this will give you only linear optimization at some point you will need _exponential_ optmizaitions. This also depends on the complexity
 of the data relationships that your application needs. You need a machine that is 64 times faster buy Nah mate you miss my point! Not bigger machines, *more* machines. A
 Dell SC1425 is a pretty low-end piece of kit, the idea is you use multiple machines. Let's say you have an application that is currently running at 100% above acceptable capacity. You can solve this problem in basically
 four ways: 1. Buy hardware that is twice as powerful 2. Perform optimisation, caching - etc. 3. A combination of the 1) and 2) 4. Buy another similar server and run them both
 In my experience, 4) is always the cheapest option, and requires less hassle than 2) and 3) (and less hassle is the TG way!). The trick is to make option 4 possible by asking questions like What will happen if I
 use two app or database servers - or both early on in the build process. I do this for everything and it's served me right so far :-) Part of my personal PHP standard library is some wrappers around
 session management and database handling that means:Scaling horizontally, what you list as 4, is the only real option.There's plenty of public record that shows that all the successfulguys (Google and LiveJournal come to mind) are using lots of
relatively cheap servers, rather than small numbers of giantservers.If you design for that, you'll never have a problem so longas you can afford to operate, and that's not so tough of a problembecause the costs are at worst linear.With any other option, the
price to upgrade grows exponentially and there's a ceiling on whatkind of power you can even buy to run an app that is mostly serial.Good optimizations can do wonders in the short term, e.g. cutimmediate hardware costs in half... but you get that anyway if you
wait about a year.It's typically better to expand your service suchthat it maximizes profits, rather than optimize your service tominimize your overhead.There's only so low you can go with cuttingyour overhead.. but there's no well defined ceiling for maximum


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups TurboGears group.  To post to this group, send email to turbogears@googlegroups.com  To unsubscribe from this group, send email to [EMAIL PROTECTED]  For more options, visit this group at http://groups.google.com/group/turbogears  -~--~~~~--~~--~--~---


[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Robin Haswell

 I've still got some decisions to make such as FastCGI vs mod_python - 
 anybody have the pro's and con's between these two?

My experience with setting up shared hosts tells me this: FastCGI is
slow but works with SuEXEC so is secure. mod_python is fast but all
processes run as the Apache user - draw your own conclusions from this.

What really solves the problem is the Apache Perchild MPM - mod_* speed
with suexec capabilities. Unfortunately for some unfathomable reason
perchild isn't finished and isn't being developed :'(

Is this accurate?

-Rob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Robin Haswell



Gerhard Häring wrote:
 Robin Haswell wrote:
 
I've still got some decisions to make such as FastCGI vs mod_python - 
anybody have the pro's and con's between these two?

My experience with setting up shared hosts tells me this: FastCGI is
slow but works with SuEXEC so is secure. mod_python is fast but all
processes run as the Apache user - draw your own conclusions from this.
 
 
 Not considering benchmarks of hello-world style apps, is 
 FastCGI/SCGI/mod_proxy really noticably slower than using mod_python for 
 real applications?

My experience is really only with PHP, where the statup times are quite
high (PHP doesn't have the concept of runtime module selection as such).
However I can't imagine any situations where FCGID would be quicker than
mod_python, as mod_python's python interpreter runs within the Apache
process itself.

 
 
What really solves the problem is the Apache Perchild MPM - mod_* speed
with suexec capabilities. Unfortunately for some unfathomable reason
perchild isn't finished and isn't being developed :'( [...]
 
 
 Apparently developing something like the Apache perchild MPM is hard. 
 Because it didn't get fixed and it had several problems, it got finally 
 scrapped in Apache 2.2.
 
 There were/are several attempts to develop a replacment, none of them 
 ready for production yet, according to their developers:
 
 metuxmpm: http://www.sannes.org/metuxmpm/
 itk mpm: http://home.samfundet.no/~sesse/mpm-itk/
 peruser mpm: http://www.telana.com/peruser.php
 
 -- Gerhard
 
 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Lateef

Most things seem to be covered exception for caching so I will share my
witless drivel about caching.
First you need to decide how dirty your data can get. If you have a
realtime stock quote system probably can't live with a lot of dirt vs
say blog comments probably don't need to be instant (well at-least the
ones I write probably don't need to be read at all). The second problem
is the web is stateless so you can't send updates down the socket to
the web browser. Thanks TG's widget and awesome AJAX support this can
be minimized (and there is much rejoicing, thanks!).

It doesn't matter how you architect the system, writes end up in one
place. Sure you can do replication but there is a world or setup,
configuration, schema changes, transactions and data synchronization oh
my! You are still only writing in once place, plus replication is just
a linear optimization, caching can get you exponential optimization
without all the lions, tigers and bears. Cache only after you have
tried to do some happy hacking to make things faster, my experience for
a couple years in EJB (oh the humanity!) was cache it and forget. Most
of the performance issues with EJB have to do with locking data so
there is no dirty data, plus the threading architecture is... Sorry I
digress often. Just cache as little as possible.

In the big project I work on we cache at a couple different levels.
Highest level is cache the page, thus skipping templates engine,
controller code, database lookup. In this project we use memcached
because it is very flexible on how we setup the caching system. No
matter what, you should create a wrapper around your caching
implementation so you an move from one type of cache to another without
changing any controller code. memcached support a number of different
languages which is nice for integration.

There are many ways to invalidate cached data. The easiest way is to
just set an expiration date. Since we are happy Postgresql users we use
a trigger system (sure features like triggers make the database slower
but I want the entire system to run faster not just the database).
There are two ways that we could do this, the first is write a python
function that gets called when an update or delete happens on cached
data. The python function in our case invalidate all the caches that
had that data. The second way is to have a trigger insert records into
an invalidation table. Every n number of seconds an external process
consumes the records by invalidating your caches (which by the way is
basically how slony works for replication).

 * Optimize code and queries first
 * Keep an abstract layer or two from the cache implementation
 * Application server scale easy as pie, database servers scale like
hernias 
 * Turbogears rules!

Good luck


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Karl Guertin

On 3/17/06, Lateef [EMAIL PROTECTED] wrote:
 The second problem
 is the web is stateless so you can't send updates down the socket to
 the web browser.

Half true. The HTTP is stateless, but you can send updates down the socket.

http://alex.dojotoolkit.org/?p=545

Cool, eh?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Robin Haswell

My $0.02 (approx £0.012 where I come from) on this:

In my company, an application that scales is an application that you can 
throw hardware at without having to think about it. We generally don't 
bother with intricate caching and optimisations, because my time has a 
cost and optimisation often buys less performance than, say, a Dell 
SC1425. I guess what I want to know about this is, are there any parts 
of TG's standard setup (with a separate DB server) that store changes on 
the application machine? Stuff like sessions - are sessions stored 
on-disk/memory? And if they are... why?

Also, I'm a firm believer that any app has a potential to become really 
popular, and will potentially need more hardware. In other words, 
optimisation is fighting a losing battle.

I'm not saying don't optimise - by all means, make sure your indexes are 
correct, make sure you're running the right SQL statements and make sure 
you're not doing needless work, but after that I don't believe in 
investing time and effort in caching mechanisms when an expansion path 
will give you more bang for your buck. Sure, there are applications when 
caching is obvious, but those aren't the applications I'd usually choose 
TurboGears for.

-Rob

PS. You don't always write to one place :-) Most people do, but there 
are solutions that are fast and safe, you just need to think outside the 
box (but I don't think I can say much more than that).

Lateef wrote:
 Most things seem to be covered exception for caching so I will share my
 witless drivel about caching.
 First you need to decide how dirty your data can get. If you have a
 realtime stock quote system probably can't live with a lot of dirt vs
 say blog comments probably don't need to be instant (well at-least the
 ones I write probably don't need to be read at all). The second problem
 is the web is stateless so you can't send updates down the socket to
 the web browser. Thanks TG's widget and awesome AJAX support this can
 be minimized (and there is much rejoicing, thanks!).
 
 It doesn't matter how you architect the system, writes end up in one
 place. Sure you can do replication but there is a world or setup,
 configuration, schema changes, transactions and data synchronization oh
 my! You are still only writing in once place, plus replication is just
 a linear optimization, caching can get you exponential optimization
 without all the lions, tigers and bears. Cache only after you have
 tried to do some happy hacking to make things faster, my experience for
 a couple years in EJB (oh the humanity!) was cache it and forget. Most
 of the performance issues with EJB have to do with locking data so
 there is no dirty data, plus the threading architecture is... Sorry I
 digress often. Just cache as little as possible.
 
 In the big project I work on we cache at a couple different levels.
 Highest level is cache the page, thus skipping templates engine,
 controller code, database lookup. In this project we use memcached
 because it is very flexible on how we setup the caching system. No
 matter what, you should create a wrapper around your caching
 implementation so you an move from one type of cache to another without
 changing any controller code. memcached support a number of different
 languages which is nice for integration.
 
 There are many ways to invalidate cached data. The easiest way is to
 just set an expiration date. Since we are happy Postgresql users we use
 a trigger system (sure features like triggers make the database slower
 but I want the entire system to run faster not just the database).
 There are two ways that we could do this, the first is write a python
 function that gets called when an update or delete happens on cached
 data. The python function in our case invalidate all the caches that
 had that data. The second way is to have a trigger insert records into
 an invalidation table. Every n number of seconds an external process
 consumes the records by invalidating your caches (which by the way is
 basically how slony works for replication).
 
  * Optimize code and queries first
  * Keep an abstract layer or two from the cache implementation
  * Application server scale easy as pie, database servers scale like
 hernias 
  * Turbogears rules!
 
 Good luck
 
 
 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Lateef

Yeah, I knew I would get roasted like a pig for not point out the
exceptions. I hadn't seen this statefull http exception before it is
cool!


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Lateef

I am not a TG expert but the options I believe are file, memory,
database and role your own.

my time has a cost and optimisation often buys less performance than,
say, a Dell SC1425
Unfortunatly my time is not worth a IBM 64way mainframe (or I would be
one happy hacker). Bigger machines help but as my comment said before
this will give you only linear optimization at some point you will need
_exponential_ optmizaitions. This also depends on the complexity of the
data relationships that your application needs. You need a machine that
is 64 times faster buy a 64 proc machine, but you need a machine 1x
then start hacking.

You don't always write to one place
My experience although limit did include some trading systems over on
this side of the pond I have not seen any applications that wrote to
more than one place. I wish I got to work on a project that required
this kind of technical scaling but alas I am a bottom feeder :) Data is
stored in RDBMS or mainframe so I can not comment on such fancy stuff.

Cheers,
Lateef


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Robin Haswell


 my time has a cost and optimisation often buys less performance than,
 say, a Dell SC1425
 Unfortunatly my time is not worth a IBM 64way mainframe (or I would be
 one happy hacker). Bigger machines help but as my comment said before
 this will give you only linear optimization at some point you will need
 _exponential_ optmizaitions. This also depends on the complexity of the
 data relationships that your application needs. You need a machine that
 is 64 times faster buy

Nah mate you miss my point! Not bigger machines, *more* machines. A Dell 
SC1425 is a pretty low-end piece of kit, the idea is you use multiple 
machines.

Let's say you have an application that is currently running at 100% 
above acceptable capacity. You can solve this problem in basically four 
ways:

1. Buy hardware that is twice as powerful
2. Perform optimisation, caching - etc.
3. A combination of the 1) and 2)
4. Buy another similar server and run them both

In my experience, 4) is always the cheapest option, and requires less 
hassle than 2) and 3) (and less hassle is the TG way!). The trick is to 
make option 4 possible by asking questions like What will happen if I 
use two app or database servers - or both early on in the build 
process. I do this for everything and it's served me right so far :-) 
Part of my personal PHP standard library is some wrappers around session 
management and database handling that means:

1) All my session data is stored in the database, which means from then 
on I can implement *all* my persistent storage in an RDBMS.

2) My database reads and my database writes are separated and 
controllable, so if we need to add replication it's possible to direct 
all writes to the master server and balance reads between the slaves. 
(Yes I said there are alternatives to the master/slave setup, but in web 
apps which are mostly read-heavy it's a pretty good solution anyway).

-Rob

PS. If you're interested in the writing to one place problem, you 
should look m/custer 
(http://www.continuent.com/index.php?option=com_contenttask=viewid=211Itemid=168).
 
We have our own solution, but in general it's a pretty awesome setup for 
database scaling through multiple servers.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-17 Thread Bob Ippolito


On Mar 17, 2006, at 3:17 PM, Robin Haswell wrote:



 my time has a cost and optimisation often buys less performance  
 than,
 say, a Dell SC1425
 Unfortunatly my time is not worth a IBM 64way mainframe (or I  
 would be
 one happy hacker). Bigger machines help but as my comment said before
 this will give you only linear optimization at some point you will  
 need
 _exponential_ optmizaitions. This also depends on the complexity  
 of the
 data relationships that your application needs. You need a machine  
 that
 is 64 times faster buy

 Nah mate you miss my point! Not bigger machines, *more* machines. A  
 Dell
 SC1425 is a pretty low-end piece of kit, the idea is you use multiple
 machines.

 Let's say you have an application that is currently running at 100%
 above acceptable capacity. You can solve this problem in basically  
 four
 ways:

 1. Buy hardware that is twice as powerful
 2. Perform optimisation, caching - etc.
 3. A combination of the 1) and 2)
 4. Buy another similar server and run them both

 In my experience, 4) is always the cheapest option, and requires less
 hassle than 2) and 3) (and less hassle is the TG way!). The trick  
 is to
 make option 4 possible by asking questions like What will happen if I
 use two app or database servers - or both early on in the build
 process. I do this for everything and it's served me right so far :-)
 Part of my personal PHP standard library is some wrappers around  
 session
 management and database handling that means:

Scaling horizontally, what you list as 4, is the only real option.   
There's plenty of public record that shows that all the successful  
guys (Google and LiveJournal come to mind) are using lots of  
relatively cheap servers, rather than small numbers of giant  
servers.  If you design for that, you'll never have a problem so long  
as you can afford to operate, and that's not so tough of a problem  
because the costs are at worst linear.  With any other option, the  
price to upgrade grows exponentially and there's a ceiling on what  
kind of power you can even buy to run an app that is mostly serial.

Good optimizations can do wonders in the short term, e.g. cut  
immediate hardware costs in half... but you get that anyway if you  
wait about a year.  It's typically better to expand your service such  
that it maximizes profits, rather than optimize your service to  
minimize your overhead.  There's only so low you can go with cutting  
your overhead.. but there's no well defined ceiling for maximum  
profits (look at Google!).

-bob


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-16 Thread Karl Guertin

On 3/16/06, ajones [EMAIL PROTECTED] wrote:
 I was wondering how well turbogears scales? Obviously interesting
 caching tricks can be done on the web face, and other strange voodoo
 can be done in the database, but is turbogears itself designed to
 scale?

Performance is not a core priority for TG. Almost any technology can
scale with the correct server architecture and proper caching tricks.
TG is no exception here.

 Can I host the controller, for instance, on multiple servers for better
 response time?

Nothing in TG prevents you from doing this. The biggest restriction
would probably be using the Identity framework with the default
sqlobject provider, as that hits the database every pageview. I'm
pretty sure you can write your own provider so that it doesn't do
this, but I don't know how you would go about doing this.

 If you had to design a truly big system, or a lot of little
 interconnected remote sites, and wanted TG to do it what are the
 options?

I've never done this, only read about it, so I'll leave that to
someone else. Word on the street is that the magic google phrase is
'shared nothing'.

 Can I do this entire post entirely with questions? I think not.

So close!

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-16 Thread Justin Johnson


  I was wondering how well turbogears scales? Obviously interesting
  caching tricks can be done on the web face, and other strange voodoo
  can be done in the database, but is turbogears itself designed to
  scale?

I don't see any reason why it shouldn't scale.  A lot of it boils down 
to the nature of your app.  Python as a language is easily good enough 
to power large systems.  Ruby, Perl and PHP have all been used to 
implement large systems and they're arguably slower than Python.

So a lot depends on your webserver configuration and your system 
architecture.

Cache what you can, cut down database hits etc.

  Can I host the controller, for instance, on multiple servers for better
  response time? If not, would it even be a good idea to implement that
  kind of functionality?

Yes - here's how I'm approaching this:

I'm planning to run Apache with either FastCGI or mod_python.  CherryPy 
will sit behind Apache and handle my controllers.  I'm not going to use 
sessions although I'll use minimal cookie data to store state.  I'm 
going to follow the mantra of sharing nothing.

The controllers will access the database server using SQLObject.  The 
database server is running Postgres.  Another server will be used for 
storing user images and files using the native flat file system.

Scaling then: when I need to scale I'll add another web server running 
Apache and TurboGears/CherryPy.  I'll load balance between these web 
servers probably in a round-robin way to begin with.  There are hardware 
and software solutions to load balancing.  So, different web servers 
will be able to handle different requests from the same user.  This is 
why sharing nothing is a good idea.  Try not to store state on your web 
servers.

As the number of web servers increases they'll begin to stress the 
database.  The database server will become the bottleneck.  Cache if you 
can.  Next on the cards then is some form of database replication, 
although that's sufficiently far enough into the future to not worry 
about just now.  :-)

Bottom line: I think the framework is good enough to trigger your 
controllers and deliever templated response in a timely fashion.  The 
rest is down to external factors that apply to any other web system out 
there.

I've still got some decisions to make such as FastCGI vs mod_python - 
anybody have the pro's and con's between these two?


 I was wondering how well turbogears scales? Obviously interesting
 caching tricks can be done on the web face, and other strange voodoo
 can be done in the database, but is turbogears itself designed to
 scale?
 
 Can I host the controller, for instance, on multiple servers for better
 response time? If not, would it even be a good idea to implement that
 kind of functionality?
 
 If you had to design a truly big system, or a lot of little
 interconnected remote sites, and wanted TG to do it what are the
 options?
 
 Can I do this entire post entirely with questions? I think not.
 
 
 
 
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-16 Thread Justin Johnson


I wrote previously:

  I've still got some decisions to make such as FastCGI vs mod_python -
  anybody have the pro's and con's between these two?

Sorry, after doing some Googling and looking around it appears I should 
be considering SCGI rather than FastCGI.


 
   I was wondering how well turbogears scales? Obviously interesting
   caching tricks can be done on the web face, and other strange voodoo
   can be done in the database, but is turbogears itself designed to
   scale?
 
 I don't see any reason why it shouldn't scale.  A lot of it boils down 
 to the nature of your app.  Python as a language is easily good enough 
 to power large systems.  Ruby, Perl and PHP have all been used to 
 implement large systems and they're arguably slower than Python.
 
 So a lot depends on your webserver configuration and your system 
 architecture.
 
 Cache what you can, cut down database hits etc.
 
   Can I host the controller, for instance, on multiple servers for better
   response time? If not, would it even be a good idea to implement that
   kind of functionality?
 
 Yes - here's how I'm approaching this:
 
 I'm planning to run Apache with either FastCGI or mod_python.  CherryPy 
 will sit behind Apache and handle my controllers.  I'm not going to use 
 sessions although I'll use minimal cookie data to store state.  I'm 
 going to follow the mantra of sharing nothing.
 
 The controllers will access the database server using SQLObject.  The 
 database server is running Postgres.  Another server will be used for 
 storing user images and files using the native flat file system.
 
 Scaling then: when I need to scale I'll add another web server running 
 Apache and TurboGears/CherryPy.  I'll load balance between these web 
 servers probably in a round-robin way to begin with.  There are hardware 
 and software solutions to load balancing.  So, different web servers 
 will be able to handle different requests from the same user.  This is 
 why sharing nothing is a good idea.  Try not to store state on your web 
 servers.
 
 As the number of web servers increases they'll begin to stress the 
 database.  The database server will become the bottleneck.  Cache if you 
 can.  Next on the cards then is some form of database replication, 
 although that's sufficiently far enough into the future to not worry 
 about just now.  :-)
 
 Bottom line: I think the framework is good enough to trigger your 
 controllers and deliever templated response in a timely fashion.  The 
 rest is down to external factors that apply to any other web system out 
 there.
 
 I've still got some decisions to make such as FastCGI vs mod_python - 
 anybody have the pro's and con's between these two?
 
 
 
I was wondering how well turbogears scales? Obviously interesting
caching tricks can be done on the web face, and other strange voodoo
can be done in the database, but is turbogears itself designed to
scale?

Can I host the controller, for instance, on multiple servers for better
response time? If not, would it even be a good idea to implement that
kind of functionality?

If you had to design a truly big system, or a lot of little
interconnected remote sites, and wanted TG to do it what are the
options?

Can I do this entire post entirely with questions? I think not.





 
 
 
 
 
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-16 Thread Jorge Godoy

Justin Johnson [EMAIL PROTECTED] writes:

 I wrote previously:

   I've still got some decisions to make such as FastCGI vs mod_python -
   anybody have the pro's and con's between these two?

 Sorry, after doing some Googling and looking around it appears I should 
 be considering SCGI rather than FastCGI.

I dunno how all of that will play with FirstClass (TG + WSGI) since this is
looking like it will be the future...

Anyway, the best thing you should do is benchmarking *your* application.  And
you should also think about using mod_proxy, CP is pretty good and fast but
combined with Apache's cache it goes even farther than one would expect.  One
advantage of using such a configuration is not having to take Apache down to
upgrade your TG website (at least I'm used to use more than one application
here, so instead of taking all of them down I just restart the TG app and it
is updated ;-)). 

-- 
Jorge Godoy  [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-16 Thread Jonathan LaCour

 I was wondering how well turbogears scales? Obviously interesting
 caching tricks can be done on the web face, and other strange voodoo
 can be done in the database, but is turbogears itself designed to
 scale?

TurboGears itself is definitely designed to scale exactly the same  
way you scale most LAMP type systems.  You scale it the same way  
that you would scale Rails applications, incidentally.  There is  
certainly some additional work that could be done with caching, etc.,  
but overall it is definitely designed to scale if you do it right.

The real question is: is your TurboGears application designed to  
scale.  I will get to this below.

 Can I host the controller, for instance, on multiple servers for  
 better
 response time? If not, would it even be a good idea to implement that
 kind of functionality?

If you properly design your application to be as stateless as  
possible, load balancing TurboGears is actually quite easy.  Here is  
the short list of how to make this happen:

1. Don't use sessions, use cookies.  If you absolutely *have* to
   use sessions, don't use in-memory sessions.  Use database
   sessions if you have to.

2. Don't store any state inside your controllers, or in memory.

If you do those things, then you can fire up as many processes as you  
like, on as many servers as you like, and use lighttpd's built-in  
load-balancing to point at your application processes.  I have done  
this myself with SCGI and lighttpd, and its crazy fast.

A lot of people tend to focus on caching and minimizing trips to the  
database when it comes to scaling, but I think the above is much more  
important at an architectural level.  This is how you should  
structure your application, and you can get to optimizing with  
caching and minimizing database hits later.  Both caching and  
database optimization take considerably more time and effort than  
throwing a few extra processes at the problem.

 If you had to design a truly big system, or a lot of little
 interconnected remote sites, and wanted TG to do it what are the
 options?

Too many options to speak of, but the WSGI future of TurboGears will  
make this much easier than it is today (not that the situation is all  
that bad today!).

Best of luck scaling!  I don't suspect you will have too many  
problems if you follow some of the simple rules mentioned in this  
thread already.

--
Jonathan LaCour
http://cleverdevil.org



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---



[TurboGears] Re: Open Question: Turbogears and scaling...

2006-03-16 Thread Kevin Dangoor

On 3/16/06, Jorge Godoy [EMAIL PROTECTED] wrote:

 Justin Johnson [EMAIL PROTECTED] writes:
  Sorry, after doing some Googling and looking around it appears I should
  be considering SCGI rather than FastCGI.

 I dunno how all of that will play with FirstClass (TG + WSGI) since this is
 looking like it will be the future...

CherryPy already uses WSGI on the front end, so First Class will not
actually change web server deployment.

(What First Class changes is how you can compose your applications and
reuse bits from elsewhere.)

Kevin

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
TurboGears group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~--~~~~--~~--~--~---