I believe that everyone is using the same words to mean different things.
Bruce, when you say "grouped around" what result would you like to see?
Make up some sample data, or use some of your real data, and demonstrate
what you expected from the database when you say "grouped around". When we
understand what you want, we can help you get there.
In the SQL sense of the phrase "group by", the engine treats all rows with
common values as a "set" of values and can apply any of the aggregating
functions (and/or other functions if you phrase them correctly) to the
"set" of values. Here's a simplified example (I left out a lot of usual
things like indexes and primary keys but I think you will see what I
mean):
CREATE TABLE student (
ID int auto_increment,
Name varchar(20)
)
CREATE TABLE subject (
ID int auto_increment,
Name varchar(20),
Section int,
Instructor varchar(20)
)
CREATE TABLE scores (
ID int auto_increment,
student_ID int,
subject_ID int,
score float,
testdate datetime,
scoredate datetime
)
Assume that we have 600 students and 40 classes (sample data omitted for
brevity). In the scores table you can find entries like (1234, 15, 25,
96.7, '2004-09-15 09:00:00', '2004-09-17 12:00:00') which translates to:
In score record #1234, student 15 received a grade in subject 25 on an
exam taken at 9AM on the 15th day of September but the instructor didn't
enter the grade until 2 days and 3 hours later.
Let's say the instructor wanted to average all of his student's grades for
that class:
SELECT student_ID, AVG(score) as avgscore
from scores
GROUP BY student_ID
WHERE subject_ID = 25
We start with data that looks like:
+----------+-----+
|student_ID|Score|
+----------+-----+
| 15 | 100 | -+
| 15 | 90 | |- one student's scores
| 15 | 80 | -+
| 16 | 81 |
| 16 | 85 |
| 16 | 83 |
| 350 | 86 | -+
| 350 | 60 | | - another student's scores
| 350 | 70 | _+
...
+----------+-----+
And because we are "grouping" on student ID we get the behavior of
+----------+----------------+
|student_ID|AVG(score) |
+----------+----------------+
| 15 | AVG(100,90,80) |
| 16 | AVG(81,85,83) |
| 350 | AVG(86,60,70) |
...
+----------+----------------+
which returns the results as:
+----------+--------+
|student_ID|avgscore|
+----------+--------+
| 15 | 90 |
| 16 | 83 |
| 350 | 72 |
...
+----------+--------+
The GROUP BY tells the engine which scores get AVG()-ed together. For each
unique combination of values that exists in the columns specified in the
GROUP BY clause of the query, the engine applies the aggregate function(s)
specified in the SELECT clause. If you neglect to specify an aggregate
function for a column NOT included in your GROUP BY clause, the engine can
pick any (random) value from the list of values that meet that condition.
That is why nearly every other SQL engine does not allow this behavior as
that permits the same query to return different results for multiple
executions against the same base data (the query becomes
"non-deterministic").
SORTING and GROUPING are separate functions. Sorting is merely putting
things into a specified order. Grouping is the formation of virtual "sets"
of data based on certain combinations of data values so that functions
like MIN(), MAX(), AVG(), STD(), and the others will have have specific
data values to work with.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
"bruce" <[EMAIL PROTECTED]> wrote on 09/10/2004 07:58:44 PM:
> paul....
>
> i do a
>
> select ID from hTBL group by type
>
> and i still only get a single row for each type, where i would expect to
get
> the ~5000 rows, grouped around the 3 different types.
>
> what's going on????
>
> thanks...
>
> -bruce
>
>
> -----Original Message-----
> From: Paul DuBois [mailto:[EMAIL PROTECTED]
> Sent: Friday, September 10, 2004 4:36 PM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: group by issue...??
>
>
> At 16:27 -0700 9/10/04, bruce wrote:
> >hi...
> >
> >if i do this...
> >
> >select
> >h1.itemID as hitem,
> >h1.process as process,
> >h1.status as status,
> >h1.tblType as tbl,
> >h1.date as date
> >from historyTBL as h1
> > where (h1.tblType = '3' or h1.tblType = '4');
> >
> >i get a results table with ~5000 rows...
> >
> >if i add the 'group by' sql,
> >
> >select
> >h1.itemID as hitem,
> >h1.process as process,
> >h1.status as status,
> >h1.tblType as tbl,
> >h1.date as date
> >from historyTBL as h1
> > where (h1.tblType = '3' or h1.tblType = '4')
> >group by tblType; <<<<<<<<<<<<<<<<
> >
> >i get a single line for each tblType...
> >
> >i was under the impression that i should get the same number of rows
> >returned, but that they would be grouped together.....
> >
> >can someone provide an explanation as to how this should work, and
what's
> >going on????
>
> If you use GROUP BY with selecting the value of any aggregate functions,
> you achieve the same effect as SELECT DISTINCT.
>
> --
> Paul DuBois, MySQL Documentation Team
> Madison, Wisconsin, USA
> MySQL AB, www.mysql.com
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>