Re: Duplicated records

Rhino Tue, 05 Apr 2005 10:48:08 -0700

Yes, I understand that. But, unless you have a version of MySQL that
supports subqueries (V4.1 or later), it's a two step process. The first
query I gave you will identify all of the batches that are duplicates by
their batch number:


select Batch, count(*) as count
from QA
group by Batch
having count >= 2;

The second query I gave you will tell you the other things you want to know
about the batches:

SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
FROM QA
WHERE Batch in ('439584414', '123456', '999444');

Does it make sense now?

Rhino

----- Original Message ----- 
From: "Scott Hamm" <[EMAIL PROTECTED]>
To: "Rhino" <[EMAIL PROTECTED]>
Cc: "Mysql" <[email protected]>
Sent: Tuesday, April 05, 2005 1:02 PM
Subject: Re: Duplicated records


> Well, to be more specific, I would like to list *ALL* duplicated Batch
> rather than to count them.
>
> SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
> FROM QA
> WHERE Batch in (duplicates);
>
> On Apr 5, 2005 12:43 PM, Rhino <[EMAIL PROTECTED]> wrote:
> > Oh, that's an entirely different problem than the one I understood from
your
> > original question.
> >
> > Your new explanation is clearer but I'm still not sure I follow all of
it.
> > The formatting/wrapping of the example makes things a bit fuzzy too. Am
I
> > right in assuming that on the first row of the result, the OperatorID is
> > 2661, the QAID is 5334 and the NTID is JulieAnt? If so, is JulieAnt the
> > reviewer? Also are the QAID and the NTID different ways of saying the
same
> > thing, such as the fact that JulieAnt's employee number is 5334, or is
it
> > just a coincidence that they are the same on both rows?
> >
> > I'm assuming that a Batch has two different operators if one shift
starts
> > the batch but the batch isn't finished by the time the shift ends so a
> > second operator finishes the Batch. Will you ever have cases where a
Batch
> > takes so long that it takes more than two shifts to finish it? For
instance,
> > would you ever see a case where the company gets an order for a billion
> > grapple grommets and it takes 9 shifts to finish it? I'm trying to
figure
> > out if you also need to worry about batches that are split over more
than
> > two shifts.
> >
> > Does the Batch value uniquely identify a particular Batch or is it the
> > combination of Batch and KeyDate?
> >
> > Ok, assuming for the moment that Batch alone uniquely identifies a
> > particular Batch, and that you care about Batches that are split across
two
> > *or more* shifts, you need a query like this to find those Batches:
> >
> > select Batch, count(*) as count
> > from QA
> > group by Batch
> > having count >= 2;
> >
> > That will return the Batch numbers alone.
> >
> > Then, you simply do a second Select to get the other properties of the
Batch
> > that you care about. That query would look very much like the one in
your
> > example except that you would do it like so:
> >
> > SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
> >  FROM QA
> >  WHERE Batch in ('439584414', '123456', '999444');
> >
> > In other words, you'd put a list consisting of all of the values
returned
> > from the one that got the Batch numbers into the IN clause.
> >
> > Now, that could be a little tedious if there were a lot of 'split'
batches
> > and you may also run into an issue if the number of of values in the IN
> > clause exceeded MySQL's limit, whatever that is. (Or maybe MySQL has a
limit
> > on the maximum *length* of the IN clause; if it does, that could bite
you
> > too.) To get around that, you could just repeat the select with the IN
> > clause for subsets of the results from the first query or even just do
your
> > original query once for each of the values returned by the counting
query.
> >
> > A much better alternative would be to use a subquery but that is only an
> > option if you are running MySQL 4.1 or later. That would let you combine
> > both queries together into a single big query. Are you running V4.1 or
> > later? If you are, I could take a stab at the combined query....
> >
> > Rhino
> >
> >
> > ----- Original Message -----
> > From: "Scott Hamm" <[EMAIL PROTECTED]>
> > To: "Rhino" <[EMAIL PROTECTED]>
> > Cc: "Mysql" <[email protected]>
> > Sent: Tuesday, April 05, 2005 11:14 AM
> > Subject: Re: Duplicated records
> >
> > > Sorry for the confusion.  I don't mean a duplicated records, but to
> > > find duplicated Batch where two different operators worked on a single
> > > batch (one started off, then another one to finish the batch) and a
> > > single reviewer to review a batch. I need a list that lists duplicated
> > > Batches with different operators that worked on that batch.
> > >
> > > For example: (for clarification)
> > > SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
> > > FROM QA
> > > WHERE Batch='439584414';
> > >
> > > ID         Batch              KeyDate                      OperatorID
> > > QAID      NTID
> > > 90577    439584414 2004-10-03 00:00:00 2661       5334 JulieAnt
> > > 90575    439584414 2004-10-03 00:00:00 5657       5334 JulieAnt
> > >
> > >
> > > On Apr 5, 2005 10:54 AM, Rhino <[EMAIL PROTECTED]> wrote:
> > > >
> > > > ----- Original Message -----
> > > > From: "Scott Hamm" <[EMAIL PROTECTED]>
> > > > To: "Mysql" <[email protected]>
> > > > Sent: Tuesday, April 05, 2005 10:05 AM
> > > > Subject: Duplicated records
> > > >
> > > > > Here is my novice question:
> > > > > I need to find duplicates in "Batch" category when I issued this
> > query:
> > > > >
> > > > > SELECT * FROM QA WHERE Batch=439584414;
> > > > >
> > > > > Result:
> > > > > 90577
> > > > > 1 2661 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00
2004-10-03
> > > > > 00:00:00 0 90575
> > > > > 1 5657 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00
2004-10-03
> > > > 00:00:00 0
> > > > >
> > > > > How do I issue a query that finds duplicated Batch number?
> > > > >
> > > > First of all, I think you made a mistake when putting this data into
> > your
> > > > note, a mistake which is compounded by the wrapping that the email
> > program
> > > > did. I *think* you meant to display two rows with the '90577' value
at
> > the
> > > > end of the second row, not before the first row. This makes the
example
> > a
> > > > bit confusing and hard to follow.
> > > >
> > > > The two rows you provide in your example are not duplicates of each
> > other.
> > > > Even assuming that the '90577' actually belongs at the end of the
second
> > row
> > > > rather than the start of the first row, the two rows are different:
the
> > > > first row ends in '90575', not '90577' and the second value in each
row
> > is
> > > > also different: '2661' and '5657'. Therefore, you're not really
trying
> > to
> > > > find duplicate records because a duplicate row would be one that is
> > > > identical in every column, not just identical in several columns. I
hope
> > > > that doesn't sound like I am splitting hairs; maybe I am. ;-)
> > > >
> > > > Anyway, the standard method for finding duplicates of rows goes like
> > this
> > > > and can be adapted for your situation. Let's say that you have a
table
> > > > called Foo with columns foo1, foo2, and foo3 and you want to find
all of
> > the
> > > > duplicate rows in the table, i.e. any row whose foo1, foo2, and foo3
> > value
> > > > is identical to the foo1, foo2, and foo3 value of another table. You
> > would
> > > > write this query:
> > > >
> > > >     select foo1, foo2, foo3, count(*) as num
> > > >     from Foo1
> > > >     group by foo1, foo2, foo3
> > > >     having num > 1
> > > >     order by num;
> > > >
> > > > Your result might look like this:
> > > >
> > > >     foo1    foo2    foo3    num
> > > >     Daffy    D        Duck    1
> > > >     Bugs    B        Bunny    1
> > > >     Elmer    A        Fudd    3
> > > >
> > > > This result means that Daffy D Duck and Bugs B Bunny each occur once
in
> > the
> > > > table but Elmer A Fudd occurs three times.
> > > >
> > > > In the case you mentioned in your email, where the rows aren't
complete
> > > > duplicates of one another, just write the query so that you omit the
> > columns
> > > > which are allowed to be different. In your case, that would appear
to be
> > all
> > > > but the second and last columns.
> > > >
> > > > However, I think if you get to the point of having duplicate records
in
> > a
> > > > table when you don't want them, finding them is like shutting the
barn
> > door
> > > > after the horse is already gone. I think you need to reconsider your
> > design,
> > > > specifically your primary key, and change that primary key to
*PREVENT*
> > the
> > > > duplicate keys in the first place.
> > > >
> > > > For instance, in the case of the Foo1 table, I can prevent duplicate
> > rows by
> > > > choosing a good primary key; in this case I would choose the
combination
> > of
> > > > all three columns in the table; that would ensure that I never
stored
> > more
> > > > than 1 person named Elmer A Fudd. [Actually, that's not a great
example!
> > It
> > > > might be perfectly valid to have two people named Elmer A Fudd - or
John
> > A
> > > > Smith - so I probably need to add additional columns to my table to
> > ensure
> > > > that the rows are unique; something like Social Security Number
which is
> > > > (supposed to be) unique would be ideal for this purpose.]
> > > >
> > > > Rhino
> > > >
> > > > --
> > > > No virus found in this outgoing message.
> > > > Checked by AVG Anti-Virus.
> > > > Version: 7.0.308 / Virus Database: 266.9.2 - Release Date:
05/04/2005
> > > >
> > > >
> > >
> > >
> > > --
> > > Power to people, Linux is here.
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]
> > >
> > >
> > > --
> > > No virus found in this incoming message.
> > > Checked by AVG Anti-Virus.
> > > Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
> > >
> > >
> >
> > --
> > No virus found in this outgoing message.
> > Checked by AVG Anti-Virus.
> > Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
> >
> >
>
>
> -- 
> Power to people, Linux is here.
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
>



-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Duplicated records

Reply via email to