Maybe increasing

#use for when mysql is doing a check or repair
set-variable    = myisam_sort_buffer_size=64M

to a higher value will make the index happen faster on the fly.

But, for a 100 million row table doing a dump and adding that dump back
to the db might be your fastest method. Building the index at insertion
for a self balancing tree is a faster in some cases (I believe this is
the case) then building one on the fly. 

My 2 cents. 100 million rows WOW.

-->-----Original Message-----
-->From: Paul DuBois [mailto:[EMAIL PROTECTED]
-->Sent: Tuesday, July 08, 2003 8:09 PM
-->To: Florian Weimer; [EMAIL PROTECTED]
-->Subject: Re: Faster reindexing
-->
-->At 9:39 +0200 7/7/03, Florian Weimer wrote:
-->>I've got a table with 100 million rows and need some indexes on it
-->>(one row is 126 bytes).
-->>
-->>I'm currently using MyISAM and the indexing proceeds at an
-->>astonishingly low rate: about 200 MB per hour.  This is rate is far
-->>too low; if we had to recover the database for some reason, we'd
have
-->>to wait for days.
-->>
-->>The table looks like this:
-->>
-->>CREATE TABLE flows (
-->>    version    CHAR NOT NULL,
-->>    router     CHAR(15) NOT NULL,
-->>    src_ip     CHAR(15) NOT NULL,
-->>    dst_ip     CHAR(15) NOT NULL,
-->>    protocol   TINYINT UNSIGNED NOT NULL,
-->>    src_port   MEDIUMINT UNSIGNED NOT NULL,
-->>    dst_port   MEDIUMINT UNSIGNED NOT NULL,
-->>    packets    INTEGER UNSIGNED NOT NULL,
-->>    bytes      INTEGER UNSIGNED NOT NULL,
-->>    src_if     MEDIUMINT UNSIGNED NOT NULL,
-->>    dst_if     MEDIUMINT UNSIGNED NOT NULL,
-->>    src_as     MEDIUMINT UNSIGNED NOT NULL,
-->>    dst_as     MEDIUMINT UNSIGNED NOT NULL,
-->>    src_net    CHAR(1) NOT NULL,
-->>    dst_net    CHAR(1) NOT NULL,
-->>    direction  CHAR(1) NOT NULL,
-->>    class      CHAR(1) NOT NULL,
-->>    start_time CHAR(24),
-->>    end_time   CHAR(24)
-->>);
-->>
-->>Indexes are created using this statement:
-->>
-->>mysql> ALTER TABLE flows
-->>     -> ADD INDEX dst_ip (dst_ip, src_ip),
-->>     -> ADD INDEX dst_port (dst_port, start_time),
-->>     -> ADD INDEX src_ip (src_ip, start_time),
-->>     -> ADD INDEX time (start_time);
-->>
-->>In theory, we could represent the columns router, src_ip, dst_ip,
-->>start_time, end_time using integers of the appropriate size, but
this
-->>would make ad-hoc queries harder to type (and porting our
applications
-->>would be even more difficult).
-->
-->Perhaps, but as a test, you might add a couple of extra columns to
-->the table, then populate them like this after loading the table:
-->
-->UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
-->INET_ATON(dst_ip);
-->
-->Then try creating the indexes using int_src_ip and int_dst_ip rather
-->than src_ip and dst_ip.
-->
-->If it's significantly faster, you may want to reconsider whether it
might
-->not be worth using INET_ATON(X) in your queries rather than X.
-->
-->>
-->>Should I switch to another table type?
-->
-->It's easy enough to convert the table to, e.g., InnoDB and then
-->create the indexes, so an empirical test should not be difficult.
-->
-->--
-->Paul DuBois, Senior Technical Writer
-->Madison, Wisconsin, USA
-->MySQL AB, www.mysql.com
-->
-->Are you MySQL certified?  http://www.mysql.com/certification/
-->
-->
-->--
-->MySQL General Mailing List
-->For list archives: http://lists.mysql.com/mysql
-->To unsubscribe:
-->http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to