Steve Revilak wrote:
>> Basically I am dealing with using, storing, and sorting a LOT of data in a
>> mysql database.
>> With all the data in the table it makes for 404.8 Million rows.  In a backup
>> sql file that makes just under 80GB.
>>     
OK, so you're dealing with a tiny database.

mysql> select count(*) from the_log;
+-----------+
| count(*)  |
+-----------+
| 449349136 |
+-----------+
1 row in set (0.00 sec)


Real table name changed to protect the innocent. The row size is around 
128 bytes. I have so few rows right now because I just purged 2005 data.

>> I an using the InnoDB engine.
>>     
Could be good, but probably a mistake. MyISAM is much more suited to 
large amounts of data. Unless you need transactional integrity on this 
particular data, I would dump InnoDB ASAP and go with MyISAM. If you 
*think* you need transactional integrity, consider the possibility that 
you might not, and that what you might actually need is transactional 
integrity on the smaller quantity of data that you have in other tables. 
Often such a large table represents some sort of external data such as a 
log file or a list of users which is either updated from an external 
source (logs) or is only inserted to most of the time (users). For both 
situations, there are better solutions than the overkill of InnoDB.

> If you haven't normalized the pants out of the table, that's probably
> a good place to start.  Anything you can do to trim a few bytes off
> the row size will make a big difference in the size of the overall
> table.  (Is this table part of a data warehouse?).  Smaller data will
> always be faster than bigger data.
>   

I don't mean any offense, but I could not disagree more with ... well, 
every sentence. First off, normalization is the antithesis of 
performance, as it almost always results in more joins. It's the right 
thing to do in many cases, but think twice before "normalizing the pants 
out of the table" for performance. It won't be a win.

The row size is ignorable, unless you've constructed your data and your 
indexes poorly. If you need to fetch every record, then of course, you 
want to have as little data as possible, but if you just need to fetch 
key records or compute summary information, then row size is not your 
most important concern by far.

Smaller data vs. bigger data is orders of magnitude less important than 
less data vs. more data, and the way you reduce the AMOUNT of data is to 
use indexes wisely. This reduces the amount of data that any given query 
must interact with, rather than trying to reduce the data in the table.

You can also, as one other person pointed out, use merge tables (a 
MyISAM feature) to break the data up, but with only 80GB of data, I 
would not worry about that just yet.

Other things to look at:

The tuning parameters for the database. Ideally, you want to be able to 
suck all of the indexes for your giant table into RAM. That means you 
need lots of RAM, and it also means that you need to hack the MySQL 
configs. There's a great book on MySQL tuning from O'Reilly that I 
suggest you check out.

 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to