>Ok here is what I will do. I will put the code up to sourceforge so you guys can check it out. Maybe you will spot a way to make fine grain fast. >
Gook idea. Some of us are busy with 1.7.0 release work but it would allow others to comment. >But I will change my assumptions to that only a single thread is working on the tables for that query. I will switch over to using B-Tree rather then B-Link and then should make make my code faster and simpler. > Thanks , let's see how it works out. > our approach? For example, if a node takes 16 bytes > then we can store the > index for a million rows in 16MB of memory. On a >Small nodes may be faster if you know the table will always be small and there are a lot of inserts and delete. Search may be little slower. >If the table ever gets big, then you will have to work out of disk. Then big nodes(100 to 500 elements) work best. >It would not be very hard to make the size of the node stored with the index. Then the size can be tuned depending on your use case. >More RAM mostly helps the Result sets and small data sets IMHO. A good cache scheme will just naturally take all the room it is given and then live in that area. > What I had in mind was a node is a fixed size structure that contains a pointer (or reference) to table data which may be in a different file or elsewhere in memory. This is more-or-less how Bob is using an in-memory index which reads from a disk file. If we use nio then the system will manage the number of nodes that it keeps in-memory. The same with the actual table data which will usually be in one or more (huge) file(s). > Also something to bear in mind is we need on the fly > generation of indexes > for result sets. Currently result sets are held >Ok. Should not really be a problem. A temp table should probably just act like a regular table anyway. Inserts into index are sorted. >Here is an idea....how can we hide disk access? >Do you generate the entire result set before you return to query? What if the result set is huge. Even big enough that it exceeds RAM.. >How about a result set is only partially generated? Get all the rows you can get without paging to disk and then return. >A second thread could then work in the backgound and 'pump' the result set as more of the query tables page in. The ResultSet would act more like a queue, with the 'pump' thread enqueue items as they page in from disk and the ResultSet.get() doing dequeue. >There are some query this would not work, such as sorted queries or ones with min(), max(), sum()... but for many it would be ok I think.. >This a complicated system..... > What I envisage is when we build the result set we build an index (could be an aggregate index) that will be used for sorting it. This way we can also get the result set to contain unique values only to deal deal with subqueries such as SELECT .. WHERE X IN (SELECT ...). Rows of the resut set are stored in a separate nio file. The index is built fully then truncated if necessary (SELECT LIMIT N M ...). The index is nio disk based. When we have finished, we fetch each pre-constructed row from nio file and return to the client. There won't be any special case or complication with this scheme, the system manages the caching so that short result sets are never written to the disk. >Another problem is what happens if a second write query alters the table before the full result is read. The result could be invalidated in that case I suppose. Or do we use transactions to handle that? > This is a a very important issue. We've got to decide which tranaction isolation level we want to support and then review the indexing options in that light. On the whole, this whole thing needs a lot of thought before we commit ourselves to a solution. One thing I am sure of is to keep things simple and let the system do the housekeeping as much as possible. Regards Fred _______________________________________________ hsqldb-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/hsqldb-developers