Well, I think you've just succeeded in demolishing my wonderful example; it turns out that we don't need to use a stored procedure to find a median after all ;-)
You're right that the algorithm I described in my original reply is a bit simplified and assumes an odd number of rows; it doesn't handle the case where the number of rows is even. I assume that was just for the convenience of the person who wrote the course materials I was teaching; they didn't want to get bogged down in the subtleties of the details of calculating a median. I have to admit I've never seen an SQL query that would compute a median before. I'm not sure I completely understand your query, particularly the GROUP BY and HAVING clauses - I know what GROUP BY and HAVING do in general, I'm just not sure what they are accomplishing in *this* case - but you're a mathematician so I'll assume that the query is accurate and will work for both odd and even numbered sets of rows ;-) It looks like I'll have to come up with a more bulletproof example of a stored procedure before I next teach the concepts. Rhino ----- Original Message ----- From: "Michael Stassen" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; "Wolfram Kraus" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; "news" <[EMAIL PROTECTED]> Sent: Tuesday, November 30, 2004 11:49 AM Subject: Re: Stored Procedure? > > [EMAIL PROTECTED] wrote: > > > news <[EMAIL PROTECTED]> wrote on 11/30/2004 07:58:18 AM: > > > > > >>Michael J. Pawlowsky wrote: > >> > >>>Because you will be downloading 500,000 rows... And I don't really > >>>think that was the point. Who cares what the example is. Personally I > >>> was quite impressed with great explanation. > >> > >>I second that it was a great explanation! Please don't get me wrong! > >>And furthermore I will only download one row two times: > >>select count -> one row > >>select ... limit count/2, 1 -> one row > > > > Who's to say that his SP can't use your method and work just that much > > faster? > > > > The point was that stored procedures can "automate" complex processes and > > that the end user of the database doesn't necessarily need to know how the > > process works (or how to code that process in SQL) if they know how to > > call the procedure and interpret its results. Stored procedures provides a > > method for a DBA to efficiently provide DB users with results they would > > normally need to code by hand to achieve. How many database users do you > > know that understand how to correctly compute a median value or generate a > > cross tab query? If you, the DBA, write stored procedures or UDFs to > > perform these and other "complex" tasks (relatively speaking) then you > > have simplified the end user's data access in some significant ways. There > > are much more complex things you can do with SPs than just computing > > medians but it made an EXCELLENT example. > > As a mathematician, I'd like to point out that medians aren't quite that > simple. "select ... limit count/2, 1" will not work at least half the time. > There are two possibilities: > > * count is odd - The median is the value in the middle, but count/2 is a > decimal, so you have something like LIMIT 13.5, 1. Mysql (4.1.7, anyway) > handles this by ignoring the decimal and gives the correct answer, but this > is problematic. The manual <http://dev.mysql.com/doc/mysql/en/SELECT.html> > clearly states "LIMIT takes one or two numeric arguments, which must be > integer constants." Hence we are relying on an undocumented feature which > could easily disappear. > > * count is even - In this case, there is no middle value! The median is > the average of the 2 values on either side of the middle. count/2 is a > positive integer, however, so "limit count/2, 1" will retrieve a row, but it > is *not* the median. > > Hence, network traffic is not an issue, but there is still work to be done. > You have to get the count, check if it is even or odd, then proceed > accordingly. In the even case, you have to retrieve two rows, then average > them. You can do all this in code on the client end, or do it on the server > in a stored procedure, making the client's life easier (and improving > his/her chances of getting it right). > > For completeness, here's a method to get the median in SQL: > > #### To get the median of the values in a column (val): > > CREATE TEMPORARY TABLE medians > SELECT x.val medians > FROM data x, data y > GROUP BY x.val > HAVING SUM(y.val <= x.val) >= COUNT(*)/2 > AND SUM(y.val >= x.val) >= COUNT(*)/2; > > SELECT AVG(medians) AS median FROM medians; > > DROP TABLE medians; > > #### To get the groupwise median of the values in a column (val) for each > #### value in another column (name): > > CREATE TEMPORARY TABLE medians > SELECT x.name, x.val medians > FROM data x, data y > WHERE x.name=y.name > GROUP BY x.name, x.val > HAVING SUM(y.val <= x.val) >= COUNT(*)/2 > AND SUM(y.val >= x.val) >= COUNT(*)/2; > > SELECT name, AVG(medians) AS median FROM medians GROUP BY name; > > DROP TABLE medians; > > Michael > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]