On Tue, Sep 18, 2012 at 7:41 PM, Rick James <[email protected]> wrote:
> SELECT ((the appropriate id)) -- <--
> FROM data_cst, rollup, data_target
> WHERE data_target.name = rollup.Target
> AND data_cst.ep = rollup.EP
> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04
> 23:59:59'
> AND data_target.id = data_cst.target_name_id
> GROUP BY rollup.Target, rollup.EP
> ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`)
> LIMIT 1 -- <--
rollup is the name of the temp table that I was testing with. It
contains the output of the original query. I don't have that in the
real world. That's my issue - I can't figure out how to join with the
result set and apply the order by to the join condition. What would I
use in place of rollup to get the 'Avg Bottom'?
Here's what my query looks like now:
SELECT data_target.name as Target,
q1.ep as EP,
COUNT(*) as Wafers,
Lots,
SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns
ELSE 0 END)) as 'Sites/Wafer',
MAX(LastRun) as "Last Run",
SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun,
COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs,
avgbottom as "Avg Bottom",
3*stdbottom as "3 Sig",
maxbottom as Max,
minbottom as Min,
SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs,
100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% Success",
3*stdbottom/avgbottom as "3Sig/Avg",
AVG(avgbottom) as 'Wafer Avg',
AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
AVG(Ranges) as 'Avg Range',
3*STD(Ranges) as '3Sig of Ranges',
MAX(Ranges) as 'Max Range',
MIN(Ranges) as 'Min Range', (SUM(numonep) -
SUM(numbottoms))/COUNT(*) as 'NAs/Wafer',
mx.id as maxID,
mn.id as minID,
lr.id as lrID
FROM (SELECT target_name_id,
ep,
wafer_id,
COUNT(bottom) as numbottoms,
AVG(bottom) as avgbottom,
STD(bottom) as stdbottom,
MAX(bottom) as maxbottom,
MIN(bottom) as minbottom,
MAX(date_time) as "LastRun",
COUNT(*) as numonep,
COUNT(DISTINCT target_name_id, ep, lot_id,
data_file_id)-1 as reruns,
COUNT(DISTINCT(lot_id)) as Lots,
3*STD(bottom) as Wafer3Sigma,
MAX(bottom) - MIN(bottom) as Ranges
FROM data_cst
WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44,
116, 117, 118, 119, 120, 121)
AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND
'2010-03-04 23:59:59'
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) < 999) q1
join data_cst mn ON mn.bottom = q1.minbottom
join data_cst mx on mx.bottom = q1.maxbottom
join data_cst lr on lr.date_time = q1.LastRun,
data_target
WHERE data_target.id = q1.target_name_id
GROUP BY q1.target_name_id, q1.ep;
>
>> -----Original Message-----
>> From: Larry Martell [mailto:[email protected]]
>> Sent: Tuesday, September 18, 2012 2:57 PM
>> To: Rick James
>> Cc: [email protected]; [email protected]
>> Subject: Re: getting certain rows from a group by
>>
>> On Tue, Sep 18, 2012 at 4:01 PM, Rick James <[email protected]>
>> wrote:
>> > SELECT mn.*
>> > mx.*
>> >> > FROM ( SELECT @min = MIN(bottom),
>> >> > @max = MAX(bottom), ... )
>> >> > JOIN data_cst mn ON bottom = @min
>> >> > JOIN data_cst mx ON bottom = @max
>> >
>> > That is, the FROM finds the value for the desired row.
>> > The JOIN then gets to the rest of the fields.
>> > Caveat: If two rows have bottom = @min, it returns both.
>>
>>
>> Thanks a lot Rick! This was super helpful. I've got it working for 3 of
>> the 4 cases - min(bottom), max(bottom), and max(date_time). But I can't
>> figure out how to work in the last case - where bottom is closest to
>> avg(bottom). In an individual query I can get it with an order by, like
>> this:
>>
>> SELECT rollup.Target, rollup.EP, rollup.`Avg Bottom`, data_cst.id,
>> data_cst.bottom FROM data_cst, rollup, data_target WHERE
>> data_target.name = rollup.Target AND data_cst.ep = rollup.EP AND
>> data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04
>> 23:59:59'
>> AND data_target.id = data_cst.target_name_id GROUP BY rollup.Target,
>> rollup.EP ORDER BY ABS(data_cst.bottom - rollup.`Avg Bottom`);
>>
>> Any way to work that into another join?
>>
>> Thanks!
>>
>> >
>> >> -----Original Message-----
>> >> From: Larry Martell [mailto:[email protected]]
>> >> Sent: Tuesday, September 18, 2012 12:54 PM
>> >> To: Rick James
>> >> Cc: [email protected]; [email protected]
>> >> Subject: Re: getting certain rows from a group by
>> >>
>> >> On Tue, Sep 18, 2012 at 2:05 PM, Rick James <[email protected]>
>> >> wrote:
>> >> > For single query...
>> >> >
>> >> > Plan A:
>> >> > Repeat the SELECT as once for each (MIN, AVG, etc) as a subquery
>> in
>> >> > WHERE bottom = ( SELECT MIN(bottom) ... )
>> >> >
>> >> > Plan B:
>> >> > FROM ( SELECT @min = MIN(bottom),
>> >> > @max = MAX(bottom), ... )
>> >> > JOIN data_cst mn ON bottom = @min
>> >> > JOIN data_cst mx ON bottom = @max
>> >>
>> >>
>> >> Can you elaborate on this? I don't see how this will give me what I
>> >> need. I'm not looking for the min or max bottom - I already have
>> that
>> >> - I'm looking for the row from each group that has the min and max
>> >> bottom.
>> >>
>> >> > Plan C:
>> >> > Get rid of 3rd party packages that eventually get in your way
>> >> > instead
>> >> of 'helping'.
>> >>
>> >> It's not django that is getting in the way. django is a fantastic
>> >> framework for web development. It's just changing requirements. The
>> >> original developer (who no longer works here) created a few
>> different
>> >> classes, and then based on the original requirements, chose to use
>> >> the one that supports a single query. That worked then, but may not
>> >> work now. There are other classes that allow multiple queries, but
>> >> then you have to parse the data and stuff it into the context that
>> >> will be picked up by the browser yourself. The single query class
>> >> does that for you. I may yet go that way, but I'm trying to avoid a
>> >> big rewrite if I can.
>> >>
>> >> >> -----Original Message-----
>> >> >> From: Larry Martell [mailto:[email protected]]
>> >> >> Sent: Tuesday, September 18, 2012 4:57 AM
>> >> >> To: [email protected]
>> >> >> Cc: [email protected]
>> >> >> Subject: Re: getting certain rows from a group by
>> >> >>
>> >> >> On Tue, Sep 18, 2012 at 7:52 AM, Peter Brawley
>> >> >> <[email protected]> wrote:
>> >> >> > On 2012-09-18 5:53 AM, Larry Martell wrote:
>> >> >> >>
>> >> >> >> I have this query:
>> >> >> >>
>> >> >> >> SELECT data_target.name as Target,
>> >> >> >> q1.ep as EP,
>> >> >> >> COUNT(*) as Wafers,
>> >> >> >> Lots,
>> >> >> >> SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN
>> >> >> >> reruns ELSE 0 END)) as 'Sites/Wafer',
>> >> >> >> MAX(LastRun) as "Last Run",
>> >> >> >> SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as
>> >> Rerun,
>> >> >> >> COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0
>> >> >> >> END) as
>> >> >> Runs,
>> >> >> >> avgbottom as "Avg Bottom",
>> >> >> >> 3*stdbottom as "3 Sig",
>> >> >> >> maxbottom as Max,
>> >> >> >> minbottom as Min,
>> >> >> >> SUM(numonep) as Count,
>> >> >> >> SUM(numonep) - SUM(numbottoms) as NAs,
>> >> >> >> 100-((SUM(numonep) -
>> SUM(numbottoms))/SUM(numonep)*100)
>> >> as
>> >> >> "%
>> >> >> >> Success",
>> >> >> >> 3*stdbottom/avgbottom as "3Sig/Avg",
>> >> >> >> AVG(avgbottom) as 'Wafer Avg',
>> >> >> >> AVG(Wafer3Sigma) as 'Wafer 3 Sigma',
>> >> >> >> AVG(Ranges) as 'Avg Range',
>> >> >> >> 3*STD(Ranges) as '3Sig of Ranges',
>> >> >> >> MAX(Ranges) as 'Max Range',
>> >> >> >> MIN(Ranges) as 'Min Range',
>> >> >> >> (SUM(numonep) - SUM(numbottoms))/COUNT(*) as
>> 'NAs/Wafer'
>> >> >> >> FROM (SELECT target_name_id,
>> >> >> >> ep,
>> >> >> >> wafer_id,
>> >> >> >> COUNT(bottom) as numbottoms,
>> >> >> >> AVG(bottom) as avgbottom,
>> >> >> >> STD(bottom) as stdbottom,
>> >> >> >> MAX(bottom) as maxbottom,
>> >> >> >> MIN(bottom) as minbottom,
>> >> >> >> MAX(date_time) as "LastRun",
>> >> >> >> COUNT(*) as numonep,
>> >> >> >> COUNT(DISTINCT target_name_id, ep, lot_id,
>> >> >> >> data_file_id)-1 as reruns,
>> >> >> >> COUNT(DISTINCT(lot_id)) as Lots,
>> >> >> >> 3*STD(bottom) as Wafer3Sigma,
>> >> >> >> MAX(bottom) - MIN(bottom) as Ranges
>> >> >> >> FROM data_cst
>> >> >> >> WHERE target_name_id IN (775, 776, 777, 778, 779, 780,
>> >> >> >> 45,
>> >> >> 44,
>> >> >> >> 116, 117, 118, 119, 120, 121)
>> >> >> >> AND data_cst.date_time BETWEEN '2010-03-04 00:00:00'
>> AND
>> >> >> >> '2010-03-04 23:59:59'
>> >> >> >> GROUP BY target_name_id, ep, wafer_id
>> >> >> >> HAVING count(*) < 999) q1,
>> >> >> >> data_target
>> >> >> >> WHERE data_target.id = target_name_id GROUP BY
>> >> >> >> q1.target_name_id, q1.ep;
>> >> >> >>
>> >> >> >>
>> >> >> >> This works fine. But now I need to get a certain column
>> >> >> >> (image_measurer_id) with each row returned that corresponds to
>> >> the
>> >> >> >> row from the group that has bottom = Min(bottom), bottom =
>> >> >> >> Max(bottom), bottom closest to Avg(bottom), and bottom from
>> the
>> >> >> >> row where date_time = Max(date_time).
>> >> >> >>
>> >> >> >> Is this even possible from one query?
>> >> >> >
>> >> >> >
>> >> >> > Might be, but what's the importance of doing it as one query?
>> >> >>
>> >> >> Because it's part of a django based web app, and the class that
>> >> >> this is part of only supports having a single query. To use
>> >> >> multiple queries will require a fairly major rewrite of the
>> server
>> >> >> side of
>> >> that app.
>> >> >>
>> >> >> > I'd start by
>> >> >> > saving this result to a temp table and developing the new
>> query.
>> >> >> > When that's running, see if you can to optimise a query built
>> by
>> >> >> > replacing the reference to the temp table with the original
>> query.
>> >> >>
>> >> >> Thanks, I'll look into this.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql