Re: Longitude Index: B-Tree or Cluster

Tim Nevels via 4D_Tech Tue, 14 Apr 2020 14:13:12 -0700

On Apr 14, 2020, at 3:39 PM, Peter Bozek wrote:

> Cluster index is suitable when the result of search is a (potentially
> large) selection. While B-tree node stores a reference to a record, Cluster
> index node stores a selection, so no selection building is needed. However,
> if I would design such an index, I would use hash table for cluster index.
> Hash table is much faster when you need to find one particular value, but
> is totally unsuitable for range searches, as it cannot be effectively
> traversed. That make a perfect sense for foreign key indexes - when loading
> 1-record, you have immediately a selection of N-records available, and you
> (almost) never do a search for range of foreign keys (with UUID keys, such
> search would not make any sense.)


Many years ago at a 4D Summit I remember someone asking Laurent a question of 
how Cluster B-Tree indexes were implemented. He gave a fairly detailed answer. 

Cluster B-Tree indexes are a B-Tree index, but there is only 1 node for each 
index value. The data in the node can be several different things. 

- It could be a reference to a single record like you would have in a 
traditional B-Tree index. It’s just a record number. 

- If the node is for 2 or more records, an array of record numbers is stored in 
the node. He made a point of saying it is the same ARRAY LONGINT just like we 
have in the language, so 4 bytes per record.

- At some point — he didn’t go into when it would happen — 4D switches from an 
array of record numbers to a set of records. He said it’s the same type of 4D 
set that we have in the language where you have 1 bit representing each record 
that is included in the set, and all the size optimizations they do for sets 
using compression and whatever. 

The idea is to try and balance speed of converting that list of records — 
whether it is an array of record numbers or a set — and the amount of storage 
needed for the node.

So when you do a typical query using a Cluster B-Tree you get a single node. If 
the node is for 1 record is is just as fast to retrieve the record as with a 
normal B-Tree index. If the node contains an array of 10 records numbers, then 
the array is used to retrieve the 10 records and create a selection. 

But if the node contains a lot of record — who know exactly how many “a lot” is 
— then a set is used to create the selection of records. And that is very fast 
and efficient. That’s why a Cluster B-Tree index on a boolean field is always 
super fast for 10,000 records or 1,000,000 records. Using sets makes it very 
fast to build that selection as compared to using a normal B-Tree index where 
you would have to walk 10,000 nodes or 1,000,000 nodes of the tree to build of 
the selection. 

Also Cluster B-Tree indexes reduces the need to rebalance the tree as new 
records are added to the table since it usually just adds data to an existing 
node rather than always adding a new node for each new record.

Tim

*****************************************
Tim Nevels
Innovative Solutions
785-749-3444
[email protected]
*****************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: Longitude Index: B-Tree or Cluster

Reply via email to