A lot is going to depend on exactly what you are planning on doing with the site. Will there be a lot of logged in users? How often will data be changing? Are you going to have a lot of complex queries (ie: searches, etc.)?

You said yesterday that the DB size would be about 20gb. Well that there will present a performance hit alone, with tables not being able to really fit into memory and means the database will take up a huge chunk of that single server.

If you don't have many logged in users and the data isn't changing all that much, then you *may* be able to get by using Boost. If you are going to have a bunch of logged in users then I would seriously look at using alternative caching like Cacherouter. With a 20gb database, the more you can keep off of it the better.

If you got a lot of images and static content then I would seriously look at pushing that off to a CDN to remove some of the burden on the server also.

From a development stand point, the MySQL's slow query log is your friend, plus the devel+performance logging module. Make sure none of your common queries are doing nasty things like resorting to filesorts on thousands of rows and that all your queries are indexed properly.

Also when dealing with caching be very careful. One thing I have seen a lot of is people who do "on demand" refreshing of expired caches. What happens is that they check the expiration or some other metric when the cache is pulled and if it fails they run the query or code to regenerate it. This is usually used on very server intensive queries. The problem lies in this example.

You have a query that takes 4 seconds to run

- User A hits the site at 00:00:00.00 and the cache needs refreshed so the query is run

- User B hits the site at 00:00:01.00. The query from user A is still running so the cache is updated and user B doesn't know this, so the query is running again.

On a high traffic site you can see how that will snowball into a bunch of people running the same query. From a development stand point, it's best to put these kind of routines into a cron job so the following happens:

- User A hits the site at 00:00:00.00 and the cache needs refreshed. You have a special "cron" table in the DB and a record is written saying that this item needs recomputed at 00:00:00.00 and User A is hit with the stale data.

- User B hits the site at 00:00:01.00 and the cache is still expired. The code checks for the record in that cron table and moves on, just serving the stale cache data.

Running cache refreshes like this on cron removes the possibilities of the queries being called multiple times.

On a cost comparison, sometimes two servers is cheaper than one. With the size of your database and traffic predictions you will probably end up having to dump a lot of extra hardware into that single server to make one "super server", where as if you have one web server and one database server you could possible get by with a medium or large server, since each would be tuned specifically to their job.

Jamie Holly
http://www.intoxination.net
http://www.hollyit.net


On 12/18/2009 10:47 AM, Walt Daniels wrote:
I listened to the presentation and found it interesting. It was missing some
of what I wanted. My site is too big for shared hosting but cannot afford
going beyond one dedicated machine. Clearly cache the hell out of everything
is probably the best advice but perhaps there are other tweaks that should
be looked at as well. A question I had submitted before the talk did not get
covered. I would like to see a graph, perhaps a nomogram, of something like
max hits per hour vs. appropriate technology (both hardware and software).

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Kieran Lal
Sent: Friday, December 18, 2009 12:09 AM
To: development
Subject: Re: [development] development with scalability in mind

On Thu, Dec 17, 2009 at 7:10 PM, Susan Stewart
<[email protected]>  wrote:
>  On 12/17/2009 09:31 AM, Kieran Lal wrote:
>>
>>  A more appropriate approach for a site of that size is to build a
>>  cluster of servers in a high availability configuration which provides
>>  more flexibility to use various web scaling technologies.  You'll see
>>  that's an approach taken with even moderately sized Drupal sites.
>>  I'll be covering all of this in quite a bit of detail in my
>>  presentation in 2.5 hours.
>
>  Unfortunately, I missed it due to a client meeting...is there a
>  transcript or recording of this anywhere?

The recorded video will be posted here:
http://acquia.com/community/resources/recorded_webinars

Keep in mind this was a one hour introductory webinar covering
scalability and performance for Drupal.  I covered a lot of material
quickly, and tried to touch on a lot of relevant performance and
scalability technologies and techniques.

Cheers,
Kieran

>
>  --Susan
>
>  --
>  "We all declare for liberty; but in using the same word we do not all mean
the same thing. With some the word liberty may mean for each man to do as he
pleases with himself, and the product of his labor; while with others, the
same word may mean for some men to do as they please with other men, and the
product of other men's labor. Here are two, not only different, but
incompatible things, called by the same name - liberty. And it follows that
each of the things is, by the respective parties, called by two different
and incompatible names - liberty and tyranny."
>  --Abraham Lincoln
>
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.716 / Virus Database: 270.14.113/2573 - Release Date: 12/18/09
02:35:00


Reply via email to