If I understand your question correctly you want to group results for
statistical analysis. For instance if you have 2000 results (sequenced in
some way) you want to be able to split those results into 10 equal (or
nearly equal) groups of results and re-aggregate your results based on the
new "generated" group number (like a decile or percentile)
Let's say, as an example, you want to know how many scores are in each
decile (tenth) of scores out of a possible maximum of 1600 per test. That
means you want to count how many people scored from 0-159, 160 - 319, ...
, 1440 -1600. There are several ways to do this but I choose to make a
table of my decile ranges (groups) for this example. This also allows you
to modify the size of each group individually.
CREATE TABLE AnalysisGroups (
groupID int not null auto_increment
, grouplow int not null
, grouphigh int not null
, primary key(id)
, key(grouplow, grouphigh)
)
INSERT AnalysisGroups (grouplow, grouphigh)
VALUES (0,159), (160, 319), (320,479 ), (480,639), (640,799), (800,959),
(960,1119), (1120,1279), (1280,1439), (1440,1600)
The test scores are stored in another table. If you want to collect how
many scores fall into each category you just join them together and total
up the matches
SELECT ag.groupID as decile
, count(tr.testID) as scores
FROM AnalysisGroups ag
INNER JOIN TestResults tr
ON tr.score >= ag.grouplow
AND tr.score <= ag.grouphigh
GROUP BY ag.groupID
If you want to group by population you have another trick you can use
similar to this one. First we need to create a table that can store the
results of the query but that also has two additional columns. One of
those columns is an auto_increment column (so that we number each element
of the result) and the second will be which "-ile" (percentile, decile,
quartile) the row falls into.
Sticking with our current example (test scores) assume we need to compute
the average score for each percentile of test takers. We could do this by
first ordering the results from lowest score to highest score (or by any
other useful measure) then dividing the list into 100 equal groups,
eventually averaging the score for each group.
CREATE TEMPORARY TABLE tmpAnalysis(
ID int auto_increment
, percentile int
, testID int
, score float
key(percentile)
);
INSERT tmpAnalysis(testID, score)
SELECT id, score
FROM TestResults
ORDER BY score;
SELECT @Pctl := count(id)/100 from tmpAnalysis;
UPDATE tmpAnalysis
SET percentile = FLOOR((ID-1)/@Pctl) * @Pctl;
SELECT percentile, AVG(score) as pctl_mean
FROM tmpAnalysis
GROUP BY percentile;
DROP TEMPORARY TABLE tmpAnalysis;
I added an "extra" column to the temp table so that you could see that you
could use that table for multiple purposes. Once I have assigned the
percentile numbers to each row, I could have identified which tests (and
which test takers) fell into each percentile.
SELECT tr.taker, tr.score
FROM TestResults tr
INNER JOIN tmpAnalysis a
on a.testID = tr.id
Where a.percentile >= 95
Assuming you haven't dropped the temp table yet, that query will give you
the list of who scored in the top 5% on that particular test.
>>Disclaimer<< I haven't had time to test any of this with live data. If
these examples don't work the first time, it's probably because I made a
typing error. Apologies in advance.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
Jo�o Borsoi Soares <[EMAIL PROTECTED]> wrote on 11/23/2004 06:30:26
AM:
> No body has answered my question so far. Does that mean there is no way
> to retrieve current row numbers in a query or no way to retrieve results
> jumping steps?
>
> I would appreciate any help.
>
> Thanks,
> Joao.
>
> -----Mensagem encaminhada-----
> > From: Jo�o Borsoi Soares <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> > Subject: row numbers, jumping rows
> > Date: Sun, 21 Nov 2004 11:23:41 -0200
> >
> > Hello list,
> >
> > I wonder if someone could help me to get the fastest result on my
query.
> > I need to classify one entity, according to a specific attribute
(let's
> > call it X), looking at a sample extracted from a database table.
> > First I need to sort my sample ordered by a specific table field (X).
> > Then I should divide my sorted sample in 10 equal groups (same number
of
> > rows). Finally, I should classify my entity (1 to 10), finding out at
> > which group its attribute X fits in.
> >
> > Let me explain my thoughts. First I should make a "SELECT COUNT(*).."
to
> > find the total number of rows in my sample. Then I make queries using
> > LIMIT until I find which group the X attribute fits in. In the worst
> > case I will have to do 10 queries. Which I think should take too long
in
> > my case.
> >
> > I wanted to make it in only one query. It could be possible if I could
> > make a query which returns result jumping rows, like "STEP N" where
"N"
> > could be the number of items in each group. Is there anything in mysql
> > that can give me that? I also thought if I could have the row numbers
in
> > the query result, I could solve it with a simple condition like,
"WHERE
> > rowNum % N = 0".
> >
> > Any ideas?
> >
> > Thanks.
> >
> >
> >
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>