Re: Optimizing a big query...

Homam S.A. Thu, 10 Mar 2005 01:26:18 -0800

--- mos <[EMAIL PROTECTED]> wrote:
> Correct, but the speed difference will more than
> make up for it. If you 
> have a busy website, it is very important not to
> overtax the database 
> server with frills, like the number of rows found.
> If you want to have the 
> record counts, then you will have to realize it may
> slow down your web 
> server by a factor of 5 or 10 compared to using
> Limits. If you think it is 
> worth it, then by all means include it. Or you could
> have a button on the 
> form so if the user really want an accurate count,
> he presses the button.



Sometimes you have to find a compromise that satisfies
the business requirements, even partially, and keep
your site efficient. For example, you could show your
users the number of hits found up to a maximum value,
let's say up to 1000. So you always limit on 1000, and
if the query returned that much rows, you could just
say: "Found more than 1000 hits", and let your users
page up to 1000 rows.

I know many high volume sites that restrict your
paging up to a certain maximum. Google, for example,
while it does show a statistical approximate of the
number of hits found based on precalculated matches
per word (since their data is read-only until the
index is refreshed on a periodic basis via a
background process), it won't let you page beyond 1000
hits.

http://www.google.com/search?q=sql&hl=en&lr=&start=1000&sa=N

So restricting the number of hits found up to a
maximum value is a good compromise, and at the same
time, you can take advantage of the SQL query cache
because all subsequent paging request for the same
query for any user will be very snappy, even if a
malicious attacker kept paging through the query all
day long.


> You would have to execute a second query to count
> the rows if you want a 
> row count and use a Limit clause. This should be
> much faster because you 
> are not returning the rows from the server, you're
> just counting them.


Actually returning the rows isn't a performance
problem in most cases. What chokes MySQL is having to
figure out all the matches in an involved query. After
I've done some testing, I found that this is not done
in a linear time. In other words, if a LIMIT 1000
query returned in 1 ms, a LIMIT 10000 won't
necessarily return in 10 ms. Most likely it will take
much longer -- again, depending on the nature of the
query and the data.

Therefore counting ALL the hits without returning data
can be very expensive and slow for non-trivial
query/data, and this is why you should avoid
SQL_CALC_FOUND_ROWS because it can bring the server to
its knees since MySQL can no longer quit processing
once it finds the top 1000; it'll have to keep finding
potentially millions of hits.

> You also have to consider the hacker or disgruntled
> employee that will 
> deliberately try to crash the server by repeating
> large queries. A Limit 
> clause prevents that from happening.
> 
> If you repeat the query with "select col1, col2 from
> table1 limit 10,10" it 
> will be found in the query cache. As long as someone
> had executed the same 
> SQL statement before, it will be found in the cache
> (unless it was so long 
> ago that it got swapped out).
> 
> Mike


Using "LIMIT offset, rowcount" caches only a single
page per query, not the entire query.

In other words, while MySQL will cache page number 2
of "SELECT * FROM MyTable LIMIT 40, 20", it will have
to re-execute the query from scratch for any other
page, like page number 3: "SELECT * FROM MyTable LIMIT
60, 20".

That's because MySQL performs query lookup on the
query string, including the part of the LIMIT clause.
You can verify that by checking:

show status like 'QCache_hits'

Once you change the offset or row_count in the LIMIT
clause for the identical query, the number of the
query cache hits won't increase.

Therefore if you always LIMIT 1000 (without an
offset), for example, for all subsequent paging
requests, you can take advantage of the SQL query
cache.

Homam


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Optimizing a big query...

Reply via email to