Yep, thanks.

I've been looking at MySQL's partitioning in 5.1 which seems the best
choice. It's not GA yet, but I may consider using it as it appears to be
working fine at the moment.

-----Original Message-----
From: Brent Baisley [mailto:[EMAIL PROTECTED] 
Sent: Friday, 27 July 2007 10:23 AM
To: Andrew Armstrong
Cc: 'Wallace Reis'; mysql@lists.mysql.com
Subject: Re: Data Warehousing and MySQL vs PostgreSQL

Wallace is right, Data Warehousing shouldn't delete any data. MySQL  
isn't as robust as say, Oracle, for partitioning so you need to fudge  
things a little. I think partitioning is the way to go and you should  
use MERGE tables to handle your partitions. Really what you are  
looking to do is create a 48 hour view, or whatever time frame you  
want. You can retain all the data, just contain a subset of it in a  
MERGE table.
If you break out your tables into certain intervals, you can modify  
what tables are contained in a MERGE table on the fly. It's  
instantaneous since you are really only modifying a table  
description, not the physical structure. You can either reference the  
tables directly or reference the "set" through the MERGE table.

There are certain "gotchas" with MERGE tables you need to be aware of  
(read the manual).

An additional option is to use a "feeder" table that would be InnoDB.  
Your main tables would be MyISAM. Every 5 minutes you would grab the  
"old" data from the InnoDB table and insert it into the MyISAM/MERGE  
table setup. Then delete the records from the InnoDB table, which  
would only ever contain at most 10 minutes worth of data so adding  
and deleting should be quick.

I've used both techniques and others to manage high insert, large  
tables. Although I only reached about 500 million records for a 1  
month time period.

On Jul 26, 2007, at 6:17 PM, Andrew Armstrong wrote:

> I've already chosen granularity for samples.
>
> 5 minute samples for example can expire after two days.
>
> Aggregated/summarized rows of this data (30 minute sample for  
> example, which
> is the aggregation of the past 30 minutes worth of 5 minute samples  
> in a
> given window) expire after a week, etc.
>
> I'm more concerned as to why inserts begin to slow down so much due  
> to the
> large table size.
>
> -----Original Message-----
> From: Wallace Reis [mailto:[EMAIL PROTECTED]
> Sent: Friday, 27 July 2007 1:02 AM
> To: Andrew Armstrong
> Cc: mysql@lists.mysql.com
> Subject: Re: Data Warehousing and MySQL vs PostgreSQL
>
> On 7/26/07, Andrew Armstrong <[EMAIL PROTECTED]> wrote:
>> Do you have a suggestion to how this should be implemented?
>> Data is aggregated over time and summary rows are created.
>
> I think that you didnt design correctly your DW.
> It should have just one very larger table (the fact table).
> Data should never be deleted. If your client want to query data about
> 'five minutes sample' when they are already expired?
> You should decide the data's granularity. And if you want to agregate
> them, do roll up.  Or you can create materialized views for these
> aggregates.
>
> -- 
> wallace reis/wreis
> Núcleo de Biologia Computacional e
> Gestão de Informações Biotecnológicas/LABBI
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql? 
> [EMAIL PROTECTED]
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql? 
> [EMAIL PROTECTED]
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to