Hi David, This looks like a very elegant solution and I will give it a try!
Thanks for the tip! Olivier. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Van Couvering Sent: Wednesday, July 18, 2007 8:46 PM To: Derby Discussion Subject: Re: Horizontal partitioning? Hi, Olivier, thanks for your response, I understand your situation now. The use of the term "horizontal partitioning" triggered a standard interpretation for me. One thing you could do is write a Java procedure, see http://wiki.apache.org/db-derby/DerbySQLroutines Using this approach, you could take as parameters the begin date, the end date, and then a string for the actual query (without the timestamp constraints). You could then compose a query against the relevant tables, given the timestamp constraints, and execute this query, which is returned as a result set to the user. David On 7/18/07, Olivier Chedru <[EMAIL PROTECTED]> wrote: > Hi David, > > The purpose of my application is to keep measurement data locally, and > Derby is perfect to manage a local database. Because of deployment > reasons, I cannot use a distributed database. > The measurement data never change (they are just dropped when their > lifetime is reached). This application aims at running 24/24, and I need > to keep the database size stable once its lifetime is reached. Database > users will want to query data over a given short period of time, with > reasonable response times. > > If I use a single table, I will fall in the following issues: > - Index maintenance will become more and more expensive as the table > grows. And it will grow a lot: typically, I will store 10000 rows per > second and keep 1 week of data. This would generate more than 6 billion > rows! > - Dropping obsolete data will mean using DELETE FROM, which is much more > expensive than dropping a table. And this will probably lead to database > fragmentation over time. > > So I try to proceed this way: > 1 - When some criteria is met (period of time, number of rows...), I > create a new table without indexes. > 2 - Data are stored using batch insert statements. > 3 - Once the table is considered full (using the same criteria as > above), I create the necessary indexes on the table. > 4 - Goto 1. > > To provide an easy access to data and hide all those tables, a view > seems very appropriate. Unfortunately, as shown in my first post, the > view does not take into account constraints on the timestamp columns, > and scans all tables. No need to say the query never ends... > > Any idea about this problem is welcome! > > Thanks, > Olivier. > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of David Van Couvering > Sent: Wednesday, July 18, 2007 7:09 AM > To: Derby Discussion > Subject: Re: Horizontal partitioning? > > Hm, if you're creating a view on all the tables, that must mean all > the tables are under a single database? If that's the case, then this > doesn't really sound like horizontal partitioning (which usually means > splitting it into multiple databases, each with its own disk and > usually each with its own CPU). > > If you're keeping everything within one database, then partitioning > across multiple tables really is not useful or necessary, as Derby > itself accomplishes quick access to any row in the table using the > index on the primary key. > > Normally, if you have N partitions, then you would have N databases on > N machines. There would be no way to create a view across all of > them. > > This is how folks like eBay and Google do it, for example. This > usually only works if the data in each partition is completely or > almost completely independent and you don't need to do queries that > span tables (e.g. you can issue the query against one specific > partition depending on things like timestamp, user id, geographic > location, etc). If your queries can't be isolated to a single > partition, then your application code is going to have to deal with > logic normally assigned to the database, such as sorting and merging > data, and now you're in the realm of a clustered database, and having > worked on two of these in the past, I think I can safely say you don't > want to go there... > > David > > On 7/16/07, Olivier Chedru <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Hi all, > > > > > > > > I am trying to achieve horizontal partitioning with Derby. > > > > > > > > My application stores *a lot* of data as time passes, and it needs to > drop > > obsolete data. > > > > > > > > Table partitioning would be the ideal solution, but it is not > implemented in > > Derby. So I create one table per period of time. All tables have the > same > > columns, the first one being the data timestamp. Dropping obsolete > data is > > then made easy: just drop old tables. > > > > > > > > Now, I need to query data without knowing how tables are organized. > So, I > > create a view on all tables (the view is deleted/created every time a > data > > table is created/dropped): fine. > > > > However, checking the runtime statistics, I notice ALL tables are > scanned > > when I use the view! Note I added constraints on the Timestamp field. > > > > > > > > Sample SQL: > > > > > > > > create table t1 (ts timestamp, a int, b int, c int); > > > > create table t2 (ts timestamp, a int, b int, c int); > > > > create table t3 (ts timestamp, a int, b int, c int); > > > > alter table t1 add constraint t1_c check (ts >= '2007-07-16 > 09:00:00.0' and > > ts < '2007-07-16 09:01:00.0'); > > > > alter table t2 add constraint t2_c check (ts >= '2007-07-16 > 09:01:00.0' and > > ts < '2007-07-16 09:02:00.0'); > > > > alter table t3 add constraint t3_c check (ts >= '2007-07-16 > 09:02:00.0' and > > ts < '2007-07-16 09:03:00.0'); > > > > create view t_view as select * from t1 union select * from t2 union > select * > > from t3; > > > > CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1); > > > > CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1); > > > > select * from t_view where ts >= '2007-07-16 09:01:30.0' and ts < > > '2007-07-16 09:02:00.0'; > > > > VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS(); > > > > > > > > The displayed execution plan shows t1, t2 and t3 are scanned. I expect > only > > t2 is used. I also tried to force using constraints by defining the > view > > this way, but it did not improve the behavior: > > > > > > > > create view t_view as select * from t1 --DERBY-PROPERTIES > constraint=t1_c > > union select * from t2 --DERBY-PROPERTIES constraint=t2_c union select > * > > from t3 --DERBY-PROPERTIES constraint=t3_c; > > > > > > > > Any idea on how to achieve horizontal partitioning? > > > > > > > > Thanks! > > > > Olivier. >
