Mysql 4 I have a slightly tougher than usual condition in which I want to remove duplicates. I am not sure how to deal with this and am looking for your suggestions.
The table that stores my data is called addresses, it contains among others, a field called email_address. Within this table emails are grouped on a unique id, so for example, select email_address from addresses where group_id = '5' AND user_id = '2' would show me all the addresses that I want to work on. The case is that users will always be adding more addresses to this group, this is for a mailing list manager I am working on. The trouble is that I suspect users will upload a batch of addresses, then a few weeks later, they will upload a new set, but they will contain the old set as well. This would pile up the duplicates in short order, and I don't want multiple emails sent to the same person over and over again. My first option is when they upload new addresses, to select and test for the existence of that address. If it exists, do not add it, otherwise I will add it. The size of some of these lists are large, in the 10's of thousands. I suspect this will add too much overhead to the import time. I can not make the column unique as there is good reason to have the email address in the column more than once, since they are "groups" of email addresses. My thought is to allow the import of all the addresses, allow all the dupes, then take out the dupes, I suspect this will be faster than a select for every email address I want to import. I think this involves selecting distinct() into a temp table, deleting the addresses from the main table, then selecting into the old table from the temp table and then destroying the temp table. If anyone can suggest a tricky way to do this with perhaps a group by clause to simpy remove the dupes in one go, I would love to hear it. In regards to the temp table, is it up to me to maintain a unique temp table name to not collide in the event 2 users were to hit the page at the same time? And now, the other "rub".... Another field in the addresses table I used to track bounced emails, lets call it 'bounces', which I increment by 1 on every bounce. Here is the other issue, I can not simply remove the dupes without first determining which dupe to remove. Basically, I want to remove all dupes where the bounce count is 0 (default), but if there are more than 2 dupes, I want to keep the one with the highest bounce count. Any idea how I should be approaching this? -- ------------------------------------------------------------- Scott Haneda Tel: 415.898.2602 http://www.newgeo.com Fax: 313.557.5052 [EMAIL PROTECTED] Novato, CA U.S.A. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]