Would using merge tables be an option? You would have to split your data, but 
then each individual table could be within MySQL
memory limits. Divide and conquer. You would need a way to work around your 
insert ignore requirement. But the end result could be
much faster.

The "solution" I use for duplicate filtering is to to create a "load" table 
which loads a set of records, then compares those
records against the merge table for duplicates, deleting any found. Then the 
load table is added to the merge table and the process
is repeated for the next batch of data.

Adding 2.5 million rows to a 500 million row table takes about 2 hours, a daily 
occurance. The bottleneck is that a record is
considered a duplicate if another similar one exists withing a 24 hour time 
period. So it's a range match rather than a direct
comparison, which means I couldn't use insert ignore anyway.

Thanks for the link.

----- Original Message ----- From: "Chris Kantarjiev" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <mysql@lists.mysql.com>
Sent: Friday, March 09, 2007 12:42 PM
Subject: Re: what journal options should I use on linux?


Yes, the article did lack a lot of methodology information.

This one is *lots* better.

http://www.bullopensource.org/ext4/sqlbench/index.html

Losing data is always bad - that's why it's in a database, not a filesystem.
But these systems have been pretty reliable and are on UPS, etc. This
is a created table, so it's not life critical ... but it is expensive.

This might be a silly question, but did you max out the
myisam_sort_buffer_size
and key_buffer_size settings? Both can go up to 4GB.

Yup. Not nearly big enough - index is 15GB!
And the code seems to misbehave, leading to crashes, at the limit.

You can also turn off indexing when loading the data, then turn it back on
when the data is loaded (if you haven't already).

We "need" INSERT IGNORE, so this isn't really an option for us,
unfortunately.

I'm going to mount them as ext2fs for the time being.

Best,
chris


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to