On Apr 14, 2020, at 3:39 PM, Peter Bozek wrote: > Cluster index is suitable when the result of search is a (potentially > large) selection. While B-tree node stores a reference to a record, Cluster > index node stores a selection, so no selection building is needed. However, > if I would design such an index, I would use hash table for cluster index. > Hash table is much faster when you need to find one particular value, but > is totally unsuitable for range searches, as it cannot be effectively > traversed. That make a perfect sense for foreign key indexes - when loading > 1-record, you have immediately a selection of N-records available, and you > (almost) never do a search for range of foreign keys (with UUID keys, such > search would not make any sense.)
Many years ago at a 4D Summit I remember someone asking Laurent a question of how Cluster B-Tree indexes were implemented. He gave a fairly detailed answer. Cluster B-Tree indexes are a B-Tree index, but there is only 1 node for each index value. The data in the node can be several different things. - It could be a reference to a single record like you would have in a traditional B-Tree index. It’s just a record number. - If the node is for 2 or more records, an array of record numbers is stored in the node. He made a point of saying it is the same ARRAY LONGINT just like we have in the language, so 4 bytes per record. - At some point — he didn’t go into when it would happen — 4D switches from an array of record numbers to a set of records. He said it’s the same type of 4D set that we have in the language where you have 1 bit representing each record that is included in the set, and all the size optimizations they do for sets using compression and whatever. The idea is to try and balance speed of converting that list of records — whether it is an array of record numbers or a set — and the amount of storage needed for the node. So when you do a typical query using a Cluster B-Tree you get a single node. If the node is for 1 record is is just as fast to retrieve the record as with a normal B-Tree index. If the node contains an array of 10 records numbers, then the array is used to retrieve the 10 records and create a selection. But if the node contains a lot of record — who know exactly how many “a lot” is — then a set is used to create the selection of records. And that is very fast and efficient. That’s why a Cluster B-Tree index on a boolean field is always super fast for 10,000 records or 1,000,000 records. Using sets makes it very fast to build that selection as compared to using a normal B-Tree index where you would have to walk 10,000 nodes or 1,000,000 nodes of the tree to build of the selection. Also Cluster B-Tree indexes reduces the need to rebalance the tree as new records are added to the table since it usually just adds data to an existing node rather than always adding a new node for each new record. Tim ***************************************** Tim Nevels Innovative Solutions 785-749-3444 [email protected] ***************************************** ********************************************************************** 4D Internet Users Group (4D iNUG) Archive: http://lists.4d.com/archives.html Options: https://lists.4d.com/mailman/options/4d_tech Unsub: mailto:[email protected] **********************************************************************

