Paul, These opinions reflect my own experience, your mileage may vary.
I the "fastest" way would depend on your source data. For most data dumps I use LOAD DATA INFILE as it will read both delimited and fixed width text. When applied to a table with NO indexes (indices?) on it I will get the best results translating data files into a MySQL database. Now, how to detect dupes and not break your primary keys..... There are several ways of checking the integrity of your source data. you ask: "If inserted records have a duplicate SSN I don't want to insert those but put them in a duplicate table or flag them" and "Insert the row and ignore duplicate using insert into master ( .)" Those are similar in what you are trying to do. Let's look at flagging or re-logging duplicate source rows first. It would be very useful to apply some "artificial" uniqueness constraint to your random source data unless you are absolutely certain that you can identify a primary key. I typically do this (when I need to) by importing the data into a table with an auto_incrementing ID field. That way I can easily identify each and every row in my source data. Using this field it is rather simple to collect which rows duplicate themselved in the columns you need to have unique. SELECT <column list> FROM import_table GROUP BY <column list> having count(1) >1 <column list> represents the list of columns in which you do not want data duplicated. To copy those to another table takes two steps:CREATE a list of your duplicated primary key columns, then INSERT the duplicated rows into the other table CREATE TEMPORARY TABLE tmpDupes SELECT <column list> FROM import_table GROUP BY <column list> having count(1) >1 INSERT duplicated_table (<destination column list>) SELECT <source column list> FROM import_table INNER JOIN tmpDupes ON ... AND ... AND ... <one comparison per column in tmpDupes against the same column in import_table DROP TABLE tmpDupes To merge your imported data with your existing data in such a way as to NOT break your existing primary keys, you can use the INSERT with the IGNORE keyword. I will sometimes add keys to my import table (after it is loaded) to speed up this processing. I drop them again before the next batch import. To flag your import data, you will have to ALTER TABLE ... ADD a new column to it then run an update query. I suggest you create a temp table for your duplicate keys (like I did above) so that you can JOIN that list back into your import data to quickly update the new flag field for only those rows that are duplicated. I am sorry I could not be more specific but you did ask a VERY generic question. Yours, Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Paul Chu" <[EMAIL PROTECTED] To: <[EMAIL PROTECTED]> net> cc: Fax to: 06/18/2004 10:02 Subject: RE: Fastest way to load a master table removing duplicates - PM Not Answered Appreciate any help at all Thanks, Paul -----Original Message----- From: Paul Chu [mailto:[EMAIL PROTECTED] Sent: Friday, June 18, 2004 10:16 AM To: [EMAIL PROTECTED] Subject: Fastest way to load a master table removing duplicates Hi all, I want to load a Master table millions of rows for other sources with a unique index on e.g. ssn social sec number. If inserted records have a duplicate SSN I don't want to insert those but put them in a duplicate table or flag them. . The primary key will be an auto-increment field. There will be other indexes such as zipcode.. What is the fastest way to load these rows and remove duplicates ? Assume I load the rows to be inserted into another table. 1. Check if the ssn already exists before inserting the row ? 2. Insert the row and ignore duplicate using insert into master ( .) select .. From loaddata I have lots of files with data that can be saved to load tables and then inserted into the master table. Regards, Paul -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]