Re: Longitude Index: B-Tree or Cluster

Bernard Escaich via 4D_Tech Tue, 14 Apr 2020 15:11:24 -0700

John,

You say "Show me all cities within 4 minutes from this longitude/latitude."


I propose to search in a square around your location, ie make a search lat +- 4 
miles and long +- 4 miles.
You will get a few occurrences and then refine the search to a circle with 
geodesic functions.

Cordialement,

Bernard Escaich



> Le 14 avr. 2020 à 23:22, John J Foster via 4D_Tech <[email protected]> a 
> écrit :
> 
> Hey Peter,
> 
> I have already added and split out the longitude into separates fields. So I 
> have the original longitude as a real and I have it separated into two other 
> fields.
> 
> There are only 360 integer longitudes (ignoring the decimal part) possible. 
> Potentially -180 through +180. In this case Cluster index seems appropriate 
> in theory. The separated text field that look like this: “-117”, “-047”, 
> “+002”, etc. And there are over 4.7 million records.
> 
> Here’s a few example there are:
> 
> “-117” -> 2,034
> “-047” -> 2,658
> “+002” -> 14,217
> “+045” -> 21,255
> and so on …
> 
> Staring with prefix:
> “+” -> 3,740,510
> “-“ -> 954,774
> 
> Almost 4 times as many “+” prefixed records.
> 
> I can affirm that searching by the longitude field as a real (indexed as 
> b-tree) is slower on the first pass taking a few seconds. In fact the search 
> requires multiple lines to find a search where longitude returns all -117 and 
> thus requires a longitude search between -117 and -118.
> 
> But after that first query it seems quite fast like a 1/2 second or so. 
> Likely because the index has been loaded.
> 
> I am not seeing any great speed differences between the pure real and the 
> longitude degree as text.
> 
>> So my advice would be to go for representation of position with reals and 
>> use standard index. Although speed difference may be small for middle-size 
>> data (how many locations there may be?)
> 
> 
> Over 4.7 million records.
> 
> I’ll keep experimenting and see which one, if any, seems acceptable.
> 
> Appreciate,
> John…
> 
> 
>> On Apr 14, 2020, at 1:22 PM, Peter Bozek <[email protected]> wrote:
>> 
>> John,
>> 
>> I do not think using text for dat that can be represented by number is a 
>> good idea. It require more memory - do not forget that Unicode uses 2 bytes 
>> for each character - and manipulation of bigger data structures is slower, 
>> but, especially, string manipulation, including comparison, depends on a 
>> locale, and is using a build-in ICU library. It means that, instead of 
>> comparing two numbers, you have all the overhead of calling a library 
>> function that applies potentially multi-pass comparison algorithm using 
>> mapping tables to two strings. Definitely, much slower.
>> 
>> Second, regarding standard and cluster index: as I understand it - and I may 
>> be wrong, 4D in general does not publish technical info - B-Trees (there are 
>> several types of them) offer optimal performance when result of search 
>> contains few records, can well do a range search, as traversing the tree is 
>> reasonably fast, but may be less effective when returning a big selection, 
>> as building a selection can be time consuming. How much slower depends, but 
>> IMHO it will be still pretty fast, unless the selection is huge.
>> 
>> Cluster index is suitable when the result of search is a (potentially large) 
>> selection. While B-tree node stores a reference to a record, Cluster index 
>> node stores a selection, so no selection building is needed. However, if I 
>> would design such an index, I would use hash table for cluster index. Hash 
>> table is much faster when you need to find one particular value, but is 
>> totally unsuitable for range searches, as it cannot be effectively 
>> traversed. That make a perfect sense for foreign key indexes - when loading 
>> 1-record, you have immediately a selection of N-records available, and you 
>> (almost) never do a search for range of foreign keys (with UUID keys, such 
>> search would not make any sense.)
>> 
>> So my advice would be to go for representation of position with reals and 
>> use standard index. Although speed difference may be small for middle-size 
>> data (how many locations there may be?)
>> 
>> HTH,
>> 
>> Peter Bozek
>> 
>> On Tue, Apr 14, 2020 at 9:17 PM John J Foster via 4D_Tech 
>> <[email protected]> wrote:
>> Hi Arnaud,
>> 
>> I have split out the longitudes into a text and real component. I have 
>> formatted the longitudes as “000” and if west of GMT prefixed with “-“ and 
>> if east prefixed with “+“. I'll do something similar with latitudes and then 
>> try building some radial searches and see.
>> 
>> Since all string is text is there a still a way to tell 4D that a field and 
>> thus an index is a fixed size?
>> 
>> Anyway we shall see if theory and implementation matches!
>> 
>> Appreciate,
>> John…
>> 
>> 
>> 
>>> On Apr 14, 2020, at 12:00 PM, [email protected] wrote:
>>> 
>>> From: Arnaud de Montard <[email protected] <mailto:[email protected]>>
>>> Subject: Re: Longitude Index: B-Tree or Cluster
>>> Date: April 14, 2020 at 2:23:26 AM PDT
>>> To: 4D iNug Technical <[email protected] <mailto:[email protected]>>
>>> 
>>> 
>>> 
>>>> Le 13 avr. 2020 à 19:36, Bernard Escaich via 4D_Tech <[email protected] 
>>>> <mailto:[email protected]>> a écrit :
>>>> 
>>>> Hi John,
>>>> 
>>>> From the top of my head, indexes on long are less efficient than on text 
>>>> values ; I have a doubt, perhaps only for updating.
>>> 
>>> I'd think the opposite, an index based on a "fixed length" field is always 
>>> more efficient (search, update…) than one based on a "variable length" 
>>> field. 
>>> 
>>> -- 
>>> Arnaud de Montard 
>> 
>> **********************************************************************
>> 4D Internet Users Group (4D iNUG)
>> Archive:  http://lists.4d.com/archives.html
>> Options: https://lists.4d.com/mailman/options/4d_tech
>> Unsub:  mailto:[email protected]
>> **********************************************************************
>> 
>> 
>> -- 
>> --
>> 
>> Peter Bozek
> 
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[email protected]
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: Longitude Index: B-Tree or Cluster

Reply via email to