Maybe increasing #use for when mysql is doing a check or repair set-variable = myisam_sort_buffer_size=64M
to a higher value will make the index happen faster on the fly. But, for a 100 million row table doing a dump and adding that dump back to the db might be your fastest method. Building the index at insertion for a self balancing tree is a faster in some cases (I believe this is the case) then building one on the fly. My 2 cents. 100 million rows WOW. -->-----Original Message----- -->From: Paul DuBois [mailto:[EMAIL PROTECTED] -->Sent: Tuesday, July 08, 2003 8:09 PM -->To: Florian Weimer; [EMAIL PROTECTED] -->Subject: Re: Faster reindexing --> -->At 9:39 +0200 7/7/03, Florian Weimer wrote: -->>I've got a table with 100 million rows and need some indexes on it -->>(one row is 126 bytes). -->> -->>I'm currently using MyISAM and the indexing proceeds at an -->>astonishingly low rate: about 200 MB per hour. This is rate is far -->>too low; if we had to recover the database for some reason, we'd have -->>to wait for days. -->> -->>The table looks like this: -->> -->>CREATE TABLE flows ( -->> version CHAR NOT NULL, -->> router CHAR(15) NOT NULL, -->> src_ip CHAR(15) NOT NULL, -->> dst_ip CHAR(15) NOT NULL, -->> protocol TINYINT UNSIGNED NOT NULL, -->> src_port MEDIUMINT UNSIGNED NOT NULL, -->> dst_port MEDIUMINT UNSIGNED NOT NULL, -->> packets INTEGER UNSIGNED NOT NULL, -->> bytes INTEGER UNSIGNED NOT NULL, -->> src_if MEDIUMINT UNSIGNED NOT NULL, -->> dst_if MEDIUMINT UNSIGNED NOT NULL, -->> src_as MEDIUMINT UNSIGNED NOT NULL, -->> dst_as MEDIUMINT UNSIGNED NOT NULL, -->> src_net CHAR(1) NOT NULL, -->> dst_net CHAR(1) NOT NULL, -->> direction CHAR(1) NOT NULL, -->> class CHAR(1) NOT NULL, -->> start_time CHAR(24), -->> end_time CHAR(24) -->>); -->> -->>Indexes are created using this statement: -->> -->>mysql> ALTER TABLE flows -->> -> ADD INDEX dst_ip (dst_ip, src_ip), -->> -> ADD INDEX dst_port (dst_port, start_time), -->> -> ADD INDEX src_ip (src_ip, start_time), -->> -> ADD INDEX time (start_time); -->> -->>In theory, we could represent the columns router, src_ip, dst_ip, -->>start_time, end_time using integers of the appropriate size, but this -->>would make ad-hoc queries harder to type (and porting our applications -->>would be even more difficult). --> -->Perhaps, but as a test, you might add a couple of extra columns to -->the table, then populate them like this after loading the table: --> -->UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = -->INET_ATON(dst_ip); --> -->Then try creating the indexes using int_src_ip and int_dst_ip rather -->than src_ip and dst_ip. --> -->If it's significantly faster, you may want to reconsider whether it might -->not be worth using INET_ATON(X) in your queries rather than X. --> -->> -->>Should I switch to another table type? --> -->It's easy enough to convert the table to, e.g., InnoDB and then -->create the indexes, so an empirical test should not be difficult. --> -->-- -->Paul DuBois, Senior Technical Writer -->Madison, Wisconsin, USA -->MySQL AB, www.mysql.com --> -->Are you MySQL certified? http://www.mysql.com/certification/ --> --> -->-- -->MySQL General Mailing List -->For list archives: http://lists.mysql.com/mysql -->To unsubscribe: -->http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]