Re: Longitude Index: B-Tree or Cluster

John J Foster via 4D_Tech Tue, 14 Apr 2020 14:23:24 -0700

Hey Peter,

I have already added and split out the longitude into separates fields. So I 
have the original longitude as a real and I have it separated into two other 
fields.


There are only 360 integer longitudes (ignoring the decimal part) possible. 
Potentially -180 through +180. In this case Cluster index seems appropriate in 
theory. The separated text field that look like this: “-117”, “-047”, “+002”, 
etc. And there are over 4.7 million records.

Here’s a few example there are:

“-117” -> 2,034
“-047” -> 2,658
“+002” -> 14,217
“+045” -> 21,255
and so on …

Staring with prefix:
“+” -> 3,740,510
“-“ -> 954,774

Almost 4 times as many “+” prefixed records.

I can affirm that searching by the longitude field as a real (indexed as 
b-tree) is slower on the first pass taking a few seconds. In fact the search 
requires multiple lines to find a search where longitude returns all -117 and 
thus requires a longitude search between -117 and -118.

But after that first query it seems quite fast like a 1/2 second or so. Likely 
because the index has been loaded.

I am not seeing any great speed differences between the pure real and the 
longitude degree as text.

> So my advice would be to go for representation of position with reals and use 
> standard index. Although speed difference may be small for middle-size data 
> (how many locations there may be?)


Over 4.7 million records.

I’ll keep experimenting and see which one, if any, seems acceptable.

Appreciate,
John…


> On Apr 14, 2020, at 1:22 PM, Peter Bozek <[email protected]> wrote:
> 
> John,
> 
>  I do not think using text for dat that can be represented by number is a 
> good idea. It require more memory - do not forget that Unicode uses 2 bytes 
> for each character - and manipulation of bigger data structures is slower, 
> but, especially, string manipulation, including comparison, depends on a 
> locale, and is using a build-in ICU library. It means that, instead of 
> comparing two numbers, you have all the overhead of calling a library 
> function that applies potentially multi-pass comparison algorithm using 
> mapping tables to two strings. Definitely, much slower.
> 
> Second, regarding standard and cluster index: as I understand it - and I may 
> be wrong, 4D in general does not publish technical info - B-Trees (there are 
> several types of them) offer optimal performance when result of search 
> contains few records, can well do a range search, as traversing the tree is 
> reasonably fast, but may be less effective when returning a big selection, as 
> building a selection can be time consuming. How much slower depends, but IMHO 
> it will be still pretty fast, unless the selection is huge.
> 
> Cluster index is suitable when the result of search is a (potentially large) 
> selection. While B-tree node stores a reference to a record, Cluster index 
> node stores a selection, so no selection building is needed. However, if I 
> would design such an index, I would use hash table for cluster index. Hash 
> table is much faster when you need to find one particular value, but is 
> totally unsuitable for range searches, as it cannot be effectively traversed. 
> That make a perfect sense for foreign key indexes - when loading 1-record, 
> you have immediately a selection of N-records available, and you (almost) 
> never do a search for range of foreign keys (with UUID keys, such search 
> would not make any sense.)
> 
> So my advice would be to go for representation of position with reals and use 
> standard index. Although speed difference may be small for middle-size data 
> (how many locations there may be?)
> 
> HTH,
> 
> Peter Bozek
> 
> On Tue, Apr 14, 2020 at 9:17 PM John J Foster via 4D_Tech 
> <[email protected]> wrote:
> Hi Arnaud,
> 
> I have split out the longitudes into a text and real component. I have 
> formatted the longitudes as “000” and if west of GMT prefixed with “-“ and if 
> east prefixed with “+“. I'll do something similar with latitudes and then try 
> building some radial searches and see.
> 
> Since all string is text is there a still a way to tell 4D that a field and 
> thus an index is a fixed size?
> 
> Anyway we shall see if theory and implementation matches!
> 
> Appreciate,
> John…
> 
> 
> 
> > On Apr 14, 2020, at 12:00 PM, [email protected] wrote:
> > 
> > From: Arnaud de Montard <[email protected] <mailto:[email protected]>>
> > Subject: Re: Longitude Index: B-Tree or Cluster
> > Date: April 14, 2020 at 2:23:26 AM PDT
> > To: 4D iNug Technical <[email protected] <mailto:[email protected]>>
> > 
> > 
> > 
> >> Le 13 avr. 2020 à 19:36, Bernard Escaich via 4D_Tech <[email protected] 
> >> <mailto:[email protected]>> a écrit :
> >> 
> >> Hi John,
> >> 
> >> From the top of my head, indexes on long are less efficient than on text 
> >> values ; I have a doubt, perhaps only for updating.
> > 
> > I'd think the opposite, an index based on a "fixed length" field is always 
> > more efficient (search, update…) than one based on a "variable length" 
> > field. 
> > 
> > -- 
> > Arnaud de Montard 
> 
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[email protected]
> **********************************************************************
> 
> 
> -- 
> --
> 
> Peter Bozek

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: Longitude Index: B-Tree or Cluster

Reply via email to