>>>>> "Frank" == Frank Vetesi <[EMAIL PROTECTED]> writes:
Frank> I have a situation where a query using indexing takes about 2x longer
Frank> than non-indexed to complete. I'd greatly appreciate any suggestions as
Frank> to how to improve the results or, if not possible, an explanation for
Frank> the results (ex. My data represents a worst-case scenario for MySQL
Frank> indexing algorithm.)
Frank> Table Details----------
Frank> -Schema:
Frank> timestamp CHAR(14) not null,
Frank> f2 CHAR(2) not null,
Frank> category CHAR(12) not null,
Frank> f4 CHAR(10) not null,
Frank> f5 CHAR(13) not null,
Frank> f6 CHAR(2) not null,
Frank> f7 CHAR(5)
Frank> -timestamp is of the form yyyymmddhhmmss
Frank> -Table comprised of 3.5 million records of production data.
Frank> -timestamp has 1 of 86,400 unique values
Frank> -category has 1 of ~9,000 unique values
Frank> -Table is loaded using INFILE, then the indexing timestamp and category
Frank> fields are indexed separately.
Frank> -The raw data has been pre-sorted on the timestamp field in ascending
Frank> order.
Frank> Test Query-----------
Frank> select timestamp,f4 from TABLE_NAME where
Frank> timestamp between '19981117000000' and '19981117010000'
Frank> and category = 'ABC'
Frank> Returns 12,685 records in 58 seconds non-indexed and 114 seconds indexed
Frank> averaged over ten iterations of the test query.
The range optimizer in MySQL checks how many rows the query will
probably match and if the index will match less than 25 % it will use
indexes else table scanning.
(you can change this by editing the following row in sql/select.cc:
if (limit < head->keyfile_info.records+head->keyfile_info.deleted)
records=head->keyfile_info.records+head->keyfile_info.deleted;
else if ((records=(head->keyfile_info.records+
head->keyfile_info.deleted)/4) < 2)
DBUG_RETURN(0); /* No nead for quick select */
Just change the '4' to something bigger.
Index are generally slower than table scanning as there will be more seeks
involved, but has the benefit that the server doesn't have to examine all
rows. It's very hard to estimate exactly when table scanning is
better than using indexes.
<cut>
Frank> -Changed timestamp from CHAR to DATETIME (thought: more efficient type
Frank> = better performance WRONG! though it did reduce table space usage by
Frank> 11%)
Frank> Query returned in 332 secs. non-indexed vs. 162 secs. indexed.
I just got the same information earlier today; The problem is that
converting a longlong to a number is awfully slow on pentium.
I just posted a small patch to [EMAIL PROTECTED] that makes datetime 20 % faster,
but there is still other things that should be optimized sometimes.
Frank> Notes--------
Frank> -I tried the same test using a very popular commercial database (VPCDB)
Frank> with minimal tuning (I'm not a DBA just yet) using the same client,
Frank> host, perl, DBI, environment, etc. and received the same results (ie.
Frank> index query slower than non-indexed query for large table), though MySQL
Frank> was faster :-) Those stats:
Frank> MySQL: Non-indexed 58 vs. Indexed 114 secs
Frank> VPCDB: Non-Indexed 118 vs. Indexed 167 secs
Frank> -My application requires speed (it will be web-based) but not
Frank> transaction processing (though robust backup/recovery (hot?) is
Frank> important).
Frank> -Not sure if executable was complied statically / mit-threads / etc.
Frank> since compile log file missing.
A suggestion:
Change the index on category to '(category,timestamp)'
Regards,
Monty
-----------------------------------------------------------
Send a mail to [EMAIL PROTECTED] with
unsubscribe mysql [EMAIL PROTECTED]
in the body of the message to unsubscribe from this list.