At 12/19/2006 11:01 PM, Ashley M. Kirchner wrote:
I'm starting to log weather data to a database and I'm trying to figure out what's the best way to create the tables. The reports are coming in every minute, of every hour, 24 hours a day. Eventually, I'd like to do some calculations on the statistics, displaying daily values (which can be broken down to hourly), but then also daily and monthly averages.

To me, it doesn't make sense to dump everything into one big table, but I can't figure out what's the best way to break it down either. Keep in mind that the only data I have, is what comes in for that minute. The daily averages I have to calculate myself (later.) But I can't see one large table being very effective when it comes to calculating that stuff.

So, how should I break the tables down? Create a new table every day (20061219_data, 20061220_data, etc.) and insert all the values in it? Or, break it down per values (temp_table, humidity_table, etc.) and insert daily data in them?


(This question doesn't pertain to PHP but to database techniques; you may get better and more friendly advice on a MySQL list.)

I'm curious, why doesn't it make sense to you to keep all the data in one big table? MySQL is certainly robust enough to keep a whack of data together. Only when table size becomes problem, say with the practicality of backup or the speed of queries or the size of the hard drive, do you need to worry about breaking it down into smaller chunks. But every database has its limits and you're smart to decide up front how to split it up.

A major factor in how you choose to store your data should be how it will be used. What kinds of queries will be most common? What time-spans do they cover? Do they usually interrogate just one parameter, e.g. either temperature or humidity but not both, or do they often query two or more parameters in search of correlations?

Without knowing more, my first tendency would be to keep all the data in a single table. One table would actually occupy less disk space than splitting the data into parallel tables because some fields would need to be duplicated in every table (timestamp, record id, perhaps location, etc.). I might choose to split the data into one table per year for ease of backup and archiving. Another approach is to allow up to 5 or 10 years of data accumulate in a single table, then archive (copy out & delete) the oldest year's data every year to keep the table size manageable.

The daily averages I have to calculate myself (later.) But I can't see one large table being very effective when it comes to calculating that stuff.

I believe it will be more efficient to calculate averages from a single table than from multiple tables. In both cases the database engine has to select the same fields to calculate the averages, but if the data is split into separate tables the engine will have to select from each table separately before compiling them.

Regards,
Paul
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to