Wallace is right, Data Warehousing shouldn't delete any data. MySQL isn't as robust as say, Oracle, for partitioning so you need to fudge things a little. I think partitioning is the way to go and you should use MERGE tables to handle your partitions. Really what you are looking to do is create a 48 hour view, or whatever time frame you want. You can retain all the data, just contain a subset of it in a MERGE table. If you break out your tables into certain intervals, you can modify what tables are contained in a MERGE table on the fly. It's instantaneous since you are really only modifying a table description, not the physical structure. You can either reference the tables directly or reference the "set" through the MERGE table.

There are certain "gotchas" with MERGE tables you need to be aware of (read the manual).

An additional option is to use a "feeder" table that would be InnoDB. Your main tables would be MyISAM. Every 5 minutes you would grab the "old" data from the InnoDB table and insert it into the MyISAM/MERGE table setup. Then delete the records from the InnoDB table, which would only ever contain at most 10 minutes worth of data so adding and deleting should be quick.

I've used both techniques and others to manage high insert, large tables. Although I only reached about 500 million records for a 1 month time period.

On Jul 26, 2007, at 6:17 PM, Andrew Armstrong wrote:

I've already chosen granularity for samples.

5 minute samples for example can expire after two days.

Aggregated/summarized rows of this data (30 minute sample for example, which is the aggregation of the past 30 minutes worth of 5 minute samples in a
given window) expire after a week, etc.

I'm more concerned as to why inserts begin to slow down so much due to the
large table size.

-----Original Message-----
From: Wallace Reis [mailto:[EMAIL PROTECTED]
Sent: Friday, 27 July 2007 1:02 AM
To: Andrew Armstrong
Cc: mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

On 7/26/07, Andrew Armstrong <[EMAIL PROTECTED]> wrote:
Do you have a suggestion to how this should be implemented?
Data is aggregated over time and summary rows are created.

I think that you didnt design correctly your DW.
It should have just one very larger table (the fact table).
Data should never be deleted. If your client want to query data about
'five minutes sample' when they are already expired?
You should decide the data's granularity. And if you want to agregate
them, do roll up.  Or you can create materialized views for these
aggregates.

--
wallace reis/wreis
Núcleo de Biologia Computacional e
Gestão de Informações Biotecnológicas/LABBI

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? [EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? [EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to