Hey Peter, I have already added and split out the longitude into separates fields. So I have the original longitude as a real and I have it separated into two other fields.
There are only 360 integer longitudes (ignoring the decimal part) possible. Potentially -180 through +180. In this case Cluster index seems appropriate in theory. The separated text field that look like this: “-117”, “-047”, “+002”, etc. And there are over 4.7 million records. Here’s a few example there are: “-117” -> 2,034 “-047” -> 2,658 “+002” -> 14,217 “+045” -> 21,255 and so on … Staring with prefix: “+” -> 3,740,510 “-“ -> 954,774 Almost 4 times as many “+” prefixed records. I can affirm that searching by the longitude field as a real (indexed as b-tree) is slower on the first pass taking a few seconds. In fact the search requires multiple lines to find a search where longitude returns all -117 and thus requires a longitude search between -117 and -118. But after that first query it seems quite fast like a 1/2 second or so. Likely because the index has been loaded. I am not seeing any great speed differences between the pure real and the longitude degree as text. > So my advice would be to go for representation of position with reals and use > standard index. Although speed difference may be small for middle-size data > (how many locations there may be?) Over 4.7 million records. I’ll keep experimenting and see which one, if any, seems acceptable. Appreciate, John… > On Apr 14, 2020, at 1:22 PM, Peter Bozek <[email protected]> wrote: > > John, > > I do not think using text for dat that can be represented by number is a > good idea. It require more memory - do not forget that Unicode uses 2 bytes > for each character - and manipulation of bigger data structures is slower, > but, especially, string manipulation, including comparison, depends on a > locale, and is using a build-in ICU library. It means that, instead of > comparing two numbers, you have all the overhead of calling a library > function that applies potentially multi-pass comparison algorithm using > mapping tables to two strings. Definitely, much slower. > > Second, regarding standard and cluster index: as I understand it - and I may > be wrong, 4D in general does not publish technical info - B-Trees (there are > several types of them) offer optimal performance when result of search > contains few records, can well do a range search, as traversing the tree is > reasonably fast, but may be less effective when returning a big selection, as > building a selection can be time consuming. How much slower depends, but IMHO > it will be still pretty fast, unless the selection is huge. > > Cluster index is suitable when the result of search is a (potentially large) > selection. While B-tree node stores a reference to a record, Cluster index > node stores a selection, so no selection building is needed. However, if I > would design such an index, I would use hash table for cluster index. Hash > table is much faster when you need to find one particular value, but is > totally unsuitable for range searches, as it cannot be effectively traversed. > That make a perfect sense for foreign key indexes - when loading 1-record, > you have immediately a selection of N-records available, and you (almost) > never do a search for range of foreign keys (with UUID keys, such search > would not make any sense.) > > So my advice would be to go for representation of position with reals and use > standard index. Although speed difference may be small for middle-size data > (how many locations there may be?) > > HTH, > > Peter Bozek > > On Tue, Apr 14, 2020 at 9:17 PM John J Foster via 4D_Tech > <[email protected]> wrote: > Hi Arnaud, > > I have split out the longitudes into a text and real component. I have > formatted the longitudes as “000” and if west of GMT prefixed with “-“ and if > east prefixed with “+“. I'll do something similar with latitudes and then try > building some radial searches and see. > > Since all string is text is there a still a way to tell 4D that a field and > thus an index is a fixed size? > > Anyway we shall see if theory and implementation matches! > > Appreciate, > John… > > > > > On Apr 14, 2020, at 12:00 PM, [email protected] wrote: > > > > From: Arnaud de Montard <[email protected] <mailto:[email protected]>> > > Subject: Re: Longitude Index: B-Tree or Cluster > > Date: April 14, 2020 at 2:23:26 AM PDT > > To: 4D iNug Technical <[email protected] <mailto:[email protected]>> > > > > > > > >> Le 13 avr. 2020 à 19:36, Bernard Escaich via 4D_Tech <[email protected] > >> <mailto:[email protected]>> a écrit : > >> > >> Hi John, > >> > >> From the top of my head, indexes on long are less efficient than on text > >> values ; I have a doubt, perhaps only for updating. > > > > I'd think the opposite, an index based on a "fixed length" field is always > > more efficient (search, update…) than one based on a "variable length" > > field. > > > > -- > > Arnaud de Montard > > ********************************************************************** > 4D Internet Users Group (4D iNUG) > Archive: http://lists.4d.com/archives.html > Options: https://lists.4d.com/mailman/options/4d_tech > Unsub: mailto:[email protected] > ********************************************************************** > > > -- > -- > > Peter Bozek ********************************************************************** 4D Internet Users Group (4D iNUG) Archive: http://lists.4d.com/archives.html Options: https://lists.4d.com/mailman/options/4d_tech Unsub: mailto:[email protected] **********************************************************************

