At 11:15 PM 11/19/2005, Rhino wrote:
----- Original Message ----- From: "mos" <[EMAIL PROTECTED]>
To: <mysql@lists.mysql.com>
Sent: Saturday, November 19, 2005 8:16 PM
Subject: Is Load Data Infile or Update is faster?
I am doing a balance line comparison between the rows of an existing
table and a text file that has newer data in it. So I'm comparing the
values field by field to the existing rows in the table. If any of the
field values are different, I need to update the table with these new
values. The table has around 25 million rows and usually only 1% of the
table needs to be compared.
I've found 2 ways to update the table with the new values:
1) I could write the new values to a text file and then use "Load Data
InFile REPLACE ..." which will replace the existing rows for the rows
that need changing. The problem of course the REPLACE option means it
will look up the old row using the primary key/unique key, deletes the
row, then adds the new row. This is disk intensive.
2) The other option would be to execute an Update for each row that needs
changing and set the changed columns individually. This means the
existing row will not have to be deleted and only some of the existing
row value(s) are changed. The problem is there could be 10,000 to 100,000
rows that need changing.
So which option is going to be faster? A Load Data Infile that deletes
the old row and adds a new one, or thousands of Updates changing only 1
to 6 values at a time?
Any answer we could give you on the basis of the information you have
provided would be based largely on assumptions that may not be true in
your particular case. You haven't said a word about your hardware or
database design
or whatever indexes are on your data, listing only three of the more
obvious factors that you have omitted, any of which could have huge
impacts on the answer. Even if you told us all of that, the performance
experts could probably only ballpark the answer.
Wouldn't it be much, _much_ better if you did your own benchmark, using
real data, on your own hardware and with your own database design and
indexes to see which alternative would really work faster?
Sure but first I wanted to bounce if off the members in this group to see
what they thought before I started writing code. I thought executing
10,000-100,000 Update queries would be slower than executing one Load Data
Infile Replace query because each line being loaded by Load Data is
replaced by an Update query. But if you think the queries could be faster
(depending on # of indexes being used), I'll write a benchmark program and
find out. I just didn't want to waste an afternoon if I didn't have to.
Mike
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]