Well, I think you've just succeeded in demolishing my wonderful example; it
turns out that we don't need to use a stored procedure to find a median
after all ;-)

You're right that the algorithm I described in my original reply is a bit
simplified and assumes an odd number of rows; it doesn't handle the case
where the number of rows is even. I assume that was just for the convenience
of the person who wrote the course materials I was teaching; they didn't
want to get bogged down in the subtleties of the details of calculating a
median.

I have to admit I've never seen an SQL query that would compute a median
before. I'm not sure I completely understand your query, particularly the
GROUP BY and HAVING clauses - I know what GROUP BY and HAVING do in general,
I'm just not sure what they are accomplishing in *this* case - but you're a
mathematician so I'll assume that the query is accurate and will work for
both odd and even numbered sets of rows ;-)

It looks like I'll have to come up with a more bulletproof example of a
stored procedure before I next teach the concepts.

Rhino

----- Original Message ----- 
From: "Michael Stassen" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "Wolfram Kraus" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; "news" <[EMAIL PROTECTED]>
Sent: Tuesday, November 30, 2004 11:49 AM
Subject: Re: Stored Procedure?


>
> [EMAIL PROTECTED] wrote:
>
> > news <[EMAIL PROTECTED]> wrote on 11/30/2004 07:58:18 AM:
> >
> >
> >>Michael J. Pawlowsky wrote:
> >>
> >>>Because you will be downloading 500,000 rows...  And I don't really
> >>>think that was the point. Who cares what the example is. Personally I
> >>> was quite impressed with great explanation.
> >>
> >>I second that it was a great explanation! Please don't get me wrong!
> >>And furthermore I will only download one row two times:
> >>select count -> one row
> >>select ... limit count/2, 1 -> one row
> >
> > Who's to say that his SP can't use your method and work just that much
> > faster?
> >
> > The point was that stored procedures can "automate" complex processes
and
> > that the end user of the database doesn't necessarily need to know how
the
> > process works (or how to code that process in SQL) if they know how to
> > call the procedure and interpret its results. Stored procedures provides
a
> > method for a DBA to efficiently provide DB users with results they would
> > normally need to code by hand to achieve. How many database users do you
> > know that understand how to correctly compute a median value or generate
a
> > cross tab query? If you, the DBA,  write stored procedures or UDFs to
> > perform these and other "complex" tasks (relatively speaking) then you
> > have simplified the end user's data access in some significant ways.
There
> > are much more complex things you can do with SPs than just computing
> > medians but it made an EXCELLENT example.
>
> As a mathematician, I'd like to point out that medians aren't quite that
> simple.  "select ... limit count/2, 1" will not work at least half the
time.
>   There are two possibilities:
>
> * count is odd -  The median is the value in the middle, but count/2 is a
> decimal, so you have something like LIMIT 13.5, 1.  Mysql (4.1.7, anyway)
> handles this by ignoring the decimal and gives the correct answer, but
this
> is problematic.  The manual
<http://dev.mysql.com/doc/mysql/en/SELECT.html>
> clearly states "LIMIT takes one or two numeric arguments, which must be
> integer constants."  Hence we are relying on an undocumented feature which
> could easily disappear.
>
> * count is even -  In this case, there is no middle value!  The median is
> the average of the 2 values on either side of the middle.  count/2 is a
> positive integer, however, so "limit count/2, 1" will retrieve a row, but
it
> is *not* the median.
>
> Hence, network traffic is not an issue, but there is still work to be
done.
>   You have to get the count, check if it is even or odd, then proceed
> accordingly.  In the even case, you have to retrieve two rows, then
average
> them.  You can do all this in code on the client end, or do it on the
server
> in a stored procedure, making the client's life easier (and improving
> his/her chances of getting it right).
>
> For completeness, here's a method to get the median in SQL:
>
> #### To get the median of the values in a column (val):
>
> CREATE TEMPORARY TABLE medians
> SELECT x.val medians
> FROM data x, data y
> GROUP BY x.val
> HAVING SUM(y.val <= x.val) >= COUNT(*)/2
> AND SUM(y.val >= x.val) >= COUNT(*)/2;
>
> SELECT AVG(medians) AS median FROM medians;
>
> DROP TABLE medians;
>
> #### To get the groupwise median of the values in a column (val) for each
> #### value in another column (name):
>
> CREATE TEMPORARY TABLE medians
> SELECT x.name, x.val medians
> FROM data x, data y
> WHERE x.name=y.name
> GROUP BY x.name, x.val
> HAVING SUM(y.val <= x.val) >= COUNT(*)/2
> AND SUM(y.val >= x.val) >= COUNT(*)/2;
>
> SELECT name, AVG(medians) AS median FROM medians GROUP BY name;
>
> DROP TABLE medians;
>
> Michael
>
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to