Hm, if you're creating a view on all the tables, that must mean all the tables are under a single database? If that's the case, then this doesn't really sound like horizontal partitioning (which usually means splitting it into multiple databases, each with its own disk and usually each with its own CPU).
If you're keeping everything within one database, then partitioning across multiple tables really is not useful or necessary, as Derby itself accomplishes quick access to any row in the table using the index on the primary key. Normally, if you have N partitions, then you would have N databases on N machines. There would be no way to create a view across all of them. This is how folks like eBay and Google do it, for example. This usually only works if the data in each partition is completely or almost completely independent and you don't need to do queries that span tables (e.g. you can issue the query against one specific partition depending on things like timestamp, user id, geographic location, etc). If your queries can't be isolated to a single partition, then your application code is going to have to deal with logic normally assigned to the database, such as sorting and merging data, and now you're in the realm of a clustered database, and having worked on two of these in the past, I think I can safely say you don't want to go there... David On 7/16/07, Olivier Chedru <[EMAIL PROTECTED]> wrote:
Hi all, I am trying to achieve horizontal partitioning with Derby. My application stores *a lot* of data as time passes, and it needs to drop obsolete data. Table partitioning would be the ideal solution, but it is not implemented in Derby. So I create one table per period of time. All tables have the same columns, the first one being the data timestamp. Dropping obsolete data is then made easy: just drop old tables. Now, I need to query data without knowing how tables are organized. So, I create a view on all tables (the view is deleted/created every time a data table is created/dropped): fine. However, checking the runtime statistics, I notice ALL tables are scanned when I use the view! Note I added constraints on the Timestamp field. Sample SQL: create table t1 (ts timestamp, a int, b int, c int); create table t2 (ts timestamp, a int, b int, c int); create table t3 (ts timestamp, a int, b int, c int); alter table t1 add constraint t1_c check (ts >= '2007-07-16 09:00:00.0' and ts < '2007-07-16 09:01:00.0'); alter table t2 add constraint t2_c check (ts >= '2007-07-16 09:01:00.0' and ts < '2007-07-16 09:02:00.0'); alter table t3 add constraint t3_c check (ts >= '2007-07-16 09:02:00.0' and ts < '2007-07-16 09:03:00.0'); create view t_view as select * from t1 union select * from t2 union select * from t3; CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1); CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1); select * from t_view where ts >= '2007-07-16 09:01:30.0' and ts < '2007-07-16 09:02:00.0'; VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS(); The displayed execution plan shows t1, t2 and t3 are scanned. I expect only t2 is used. I also tried to force using constraints by defining the view this way, but it did not improve the behavior: create view t_view as select * from t1 --DERBY-PROPERTIES constraint=t1_c union select * from t2 --DERBY-PROPERTIES constraint=t2_c union select * from t3 --DERBY-PROPERTIES constraint=t3_c; Any idea on how to achieve horizontal partitioning? Thanks! Olivier.
