[TurboGears] Re: Scalability

Bob Ippolito Mon, 02 Oct 2006 14:20:22 -0700

On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
>
> On Mon, 2006-10-02 at 12:19 -0700, Bob Ippolito wrote:
> > On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
> > >
> > > On Mon, 2006-10-02 at 10:45 -0700, Bob Ippolito wrote:
> > > > On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > On Mon, 2006-10-02 at 10:04 -0700, Bob Ippolito wrote:
> > > > > > On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
> > > > > > >
> > > > > > > Thanks for the reply Kevin...
> > > > > > >
> > > > > > > > > I want to ask some questions, however, about scalability.  I'm
> > > > > > > > > developing a web system (the pages of which will be 
> > > > > > > > > customised on a
> > > > > > > > > per-user basis), that may grow to be quite popular.  I need to
> > > > > > > > > implement this, such that it's horizontally scalable in an 
> > > > > > > > > indefinite
> > > > > > > > > manner.
> > > > > > > > >
> > > > > > > > > OK, so web server replication and load balancing is easy.  My 
> > > > > > > > > problem
> > > > > > > > > is with the DB.  I can find several good-looking master-slave 
> > > > > > > > > DB
> > > > > > > > > replicators (Slony for PG, for example), but I can't find a 
> > > > > > > > > suitable
> > > > > > > > > load-balancing mechanism, especially one that integrates with
> > > > > > > > > SQLObject
> > > > > > > > > or SQLAlchemy.
> > > > > > > >
> > > > > > > > I'm not sure what you mean here. In what way is the ORM 
> > > > > > > > involved with
> > > > > > > > the database replication? Do you mean from the standpoint of 
> > > > > > > > having
> > > > > > > > some collection of web servers talk to some specifically 
> > > > > > > > collection
> > > > > > > > of database servers?
> > > > > > >
> > > > > > > *** As I see it, there are two problems in using a distributed
> > > > > > > master-slave arrangement for the DB: replication (i.e. mirroring 
> > > > > > > data
> > > > > > > from the master to the slaves) and load balancing (i.e. balancing 
> > > > > > > the
> > > > > > > "DB-read" load across the slaves).
> > > > > > >
> > > > > > > Replication is handled by tools such as Slony.  What I need from 
> > > > > > > the ORM
> > > > > > > (or whatever) is a mechanism for load balancing.  I need to be 
> > > > > > > able to
> > > > > > > say: here's my master server (for writing) and here is my list of 
> > > > > > > slave
> > > > > > > servers (for reading).  Please balance the system load 
> > > > > > > appropriately,
> > > > > > > across these servers.  Or I need a hook where I can insert code 
> > > > > > > of my
> > > > > > > own to do this.
> > > > > > >
> > > > > > > I have a sneaking suspicion that it might be possible in 
> > > > > > > SQLAlchemy, but
> > > > > > > I don't think it will integrate out of the box with TG's Identity
> > > > > > > implementation.
> > > > > > >
> > > > > > > Plus, I would like to do it in SQLObject, so I can have Catwalk.
> > > > > > >
> > > > > > > Any suggestions?
> > > > > >
> > > > > > Why don't you do load balancing at the DB layer with pgpool or 
> > > > > > something?
> > > > >
> > > > > *** pgpool is limited to one master, and one slave.  It's scalability 
> > > > > is
> > > > > therefore quite limited.
> > > > >
> > > > > I haven't found any general-purpose tools which can provide unlimited
> > > > > (say, >20 slaves) scalability for either MySQL or PostgreSQL.  Does
> > > > > anyone know of one?
> > > >
> > > > Well, you only want one master... at least for any of the free
> > > > PostgreSQL replication solutions. I always partition my usage between
> > > > read-only and read-write connections, so it's rather easy to make that
> > > > work.
> > >
> > > *** I'm happy with only one master, and I also wish to partition my
> > > usage between read-only and read-write connections.  I want to do that,
> > > however, within a single Turbogears "application".  Is this what you do?
> > > Can you provide some hints on how to do it?  Also, do you know of a way
> > > to pool your read-only connections to a number of slaves, thereby
> > > distributing the load (within either SO or SA)?
> >
> > I'm not currently using any ORM, so using different SA engines for
> > different queries is trivial. I'm also not currently distributing
> > among several slaves, but if I had to I would use something like SQL
> > Relay rather than trying to shove load balancing into my model.
> >
> > > SQL Relay seems capable of load-balancing across a number of read-only
> > > DBs.  And it has a drop-in replacement API for PostgreSQL.  It doesn't
> > > distinguish between master and slaves, however, and so can't
> > > automatically manage the difference between reading and writing.  Also,
> > > SQL Relay load-balances on a per-connection basis, which run contrary to
> > > Turbo Gears' persistent-connection architecture.  Which sucks.  Have you
> > > any experience with using SQL Relay under Turbo Gears?
> >
> > The reason you load balance is so that you get better concurrency. For
> > serial requests one database is going to do just as well (if not
> > better due to cache effects) than a pool. The way TG and SQL Relay
> > would interact is fine, because you get different connections for each
> > thread in the TG pool. Concurrent queries will be sent to different
> > servers (at the discretion of SQL Relay of course), so load balancing
> > still does exactly what it's supposed to.
>
> I'm not so sure that TG and SQL Relay would play nice with each other.
> AFAIK, SA and SO maintain (within the context of TG) connection pools to
> the DB.  Now, these connections can be created on-demand, but the
> problem is that they persist and never go away.  Say I have 10 servers
> under SQL relay, and a max of 10 connections in my SA pool.  I start
> using the system, and the 10 connections are quickly established.  Can I
> be guaranteed that I will have 1 connection per server, with no servers
> idle?  I'm not so sure about that.  I suspect that SQL Relay is designed
> to work in a dynamic system, where connections are opened and closed all
> the time, and are allocated to a particular server based on the current
> weather and alignment of the planets, etc.
>
> We might be getting a little OT here :)


I suspect that your suspicions are unfounded. SQL Relay's load
balancing feature is designed to spread out connections to databases
by a given metric, the default being equal priority for all of them.
If this is actually based on the alignment of the planets, then SQL
Relay is broken and doesn't behave as documented.

-bob

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to turbogears@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~----------~----~----~----~------~----~------~--~---

[TurboGears] Re: Scalability

Reply via email to