Hi Gerd

Last year it was reported that string ordering with the '#' character was 
incorrect. This was
because, in the sort/cp*.txt files, the relevant line with the '#' was taken as 
a comment.

I had a patch that fixed all the files, but it also attempted to do more with 
ß/ss and
dipthongs.

I've done another patch that doesn't have any contentious changes, just fixes 
the #, makes the
layout consistent between the files, increments the version/id2 values and 
slight improvements
to the documentation.

Ticker

On Tue, 2022-01-11 at 14:00 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> > 
> > if you don't mind I'd like to postpone this patch until the active branches 
> > are merged into
> > trunk.
> > 
> > Gerd
> > 
> > 
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag von 
> > Ticker Berkin
> > <rwb-mkg...@jagit.co.uk>
> > Gesendet: Dienstag, 11. Januar 2022 11:25
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] Fix and augment sort definitions
> > 
> > Hi Gerd
> > 
> > Yes - gmapsupp builder gives a warning if id1/id2 are not consistent in
> > all the .img files. It is just a warning and gmapsupp is built anyway
> > and I think the warning can be ignored. gmapi doesn't notice.
> > 
> > Almost all of the significant sorting where the Garmin device... needs
> > to know the sort details happens in Mdr, so this isn't a problem.
> > 
> > Other uses are mostly for de-duping/efficient processing, so these
> > shouldn't matter either.
> > 
> > However the LBL file does hold id1/id2 and many sections (Countries,
> > Regions, Cities, Zips, POIs) are sorted so the effect here is unknown.
> > 
> > If using --latin2 / 1252, the only change in ordering is around AE/OE
> > dipthongs.
> > 
> > Within the same commit or build as sortResource_v2, the attached
> > sortMashExp.patch should be applied, as it effects the binary SRT file
> > and I don't want to increment all the id2's again. This patch changes
> > the sort.expand TERTIARY mashing from 2 to 3, which is slightly more
> > consistent with the Garmin SRT binaries I've seen and allows SrtDisplay
> > to show expansions with what looks like a meaningful case.
> > 
> > Ticker
> > 
> > On Tue, 2022-01-11 at 06:31 +0000, Gerd Petermann wrote:
> > > > Hi Ticker,
> > > > 
> > > > didn't try it: Will mkgmap complain when building an indexed
> > > > gmapi/gmapsupp
> > > > where some tiles where freshly compiled with the new version and
> > > > others with
> > > > an older (like Felix and Carlos do)?
> > > > 
> > > > Gerd
> > > > 
> > > > ________________________________________
> > > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > > > von Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > > > Gesendet: Montag, 10. Januar 2022 12:04
> > > > An: Development list for mkgmap
> > > > Betreff: Re: [mkgmap-dev] Fix and augment sort definitions
> > > > 
> > > > Hi Gerd
> > > > 
> > > > What I meant was that keyboards/devices don't normally have ways of
> > > > entering the single chars "…", "¼", "½", "¾", "™".
> > > > 
> > > > Names with these might be presented by Garmin software after some
> > > > initial chars have been entered and you can then select the complete
> > > > name that contains these chars.
> > > > 
> > > > I didn't see a good reason to remove the expand for these and find
> > > > some
> > > > arbitrary sort PRIMARY for them. No one has complained about them.
> > > > Also
> > > > cp65001 had over 1000 expands and I really don't want to start
> > > > touching
> > > > these.
> > > > 
> > > > Ticker
> > > > 
> > > > 
> > > > On Mon, 2022-01-10 at 10:29 +0000, Gerd Petermann wrote:
> > > > > > Hi Ticker,
> > > > > > 
> > > > > > I've committed displaySrt_v2.patch .
> > > > > > 
> > > > > > I don't fully understand the comment
> > > > > > "Leave the above because no method of inputting them anyway and
> > > > > > unlikely at start of names."
> > > > > > 
> > > > > > It is possible to enter these characters in MapSource and I think
> > > > > > MapSource uses MDR12
> > > > > > when you type only a few characters for the name of a POI and don't
> > > > > > pick up an entry from the list.
> > > > > > 
> > > > > > Gerd
> > > > > > 
> > > > > > ________________________________________
> > > > > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag
> > > > > > von
> > > > > > Ticker Berkin <rwb-mkg...@jagit.co.uk>
> > > > > > Gesendet: Montag, 10. Januar 2022 11:20
> > > > > > An: Development list for mkgmap
> > > > > > Betreff: Re: [mkgmap-dev] Fix and augment sort definitions
> > > > > > 
> > > > > > Hi Gerd
> > > > > > 
> > > > > > I tried various approaches to fixing "Find" when the fixed length
> > > > > > Mdr17
> > > > > > (maybe also Mdr12) prefix contains sort.expand chars and couldn't
> > > > > > make
> > > > > > it work. I could documents these attempts in Sort.java if you feel
> > > > > > this
> > > > > > is worthwhile.
> > > > > > 
> > > > > > New patch attached that, for cp1252, leaves "ß" as its own PRIMARY
> > > > > > after "s". Moved æ,Æ etc to be PRIMARIES on the grounds that their
> > > > > > behaviour will be the same as "ß". Made cp1254 consistent as it had
> > > > > > similar partial fixes.
> > > > > > 
> > > > > > The main reason for the patch is to fix all the other sort/cp*.txt
> > > > > > files that had line " > #" which was taken as a comment, resulting
> > > > > > in
> > > > > > "#" being ignored in collation.
> > > > > > 
> > > > > > With the Display patch (sent previously, but also attached here),
> > > > > > it
> > > > > > can reproduce the resource/sort file from the binary SRT section.
> > > > > > 
> > > > > > Ticker
> > > > > > 
> > > > > > _______________________________________________
> > > > > > mkgmap-dev mailing list
> > > > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > > 
> > > > 
> > > > _______________________________________________
> > > > mkgmap-dev mailing list
> > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > > _______________________________________________
> > > > mkgmap-dev mailing list
> > > > mkgmap-dev@lists.mkgmap.org.uk
> > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > 
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev@lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


Index: resources/sort/README
===================================================================
--- resources/sort/README	(revision 4915)
+++ resources/sort/README	(working copy)
@@ -35,22 +35,24 @@
 I believe that these are arbitary identifiers.  Here is a registry of
 values we are using.  If you make a variation on a code-page
 sort-order then give it a different id2 value.
+It is believed that having sorts with the same id1/id2 but different data loaded
+on the same device will give unexpected results
 
-code-page  id1  id2
+code-page  id1  description
 
-1250       12   1
-1251        8   1
-1252        7   2
-1253       13   1
-1254       14   1
-1255       15   1
-1256       16   1
-1257       17   1
-1258       18   1
-874        11   1
-932         9   1
-936         5   1
-949        10   1
+1250       12   Central European sort
+1251        8   Cyrillic sort
+1252        7   Western European sort
+1253       13   Greek sort
+1254       14   Turkish sort
+1255       15   Hebrew sort
+1256       16?9 Arabic sort		cp1256.txt has id1=9, original version of this doc said 16
+1257       17   Latin Baltic sort
+1258       18   Vietnamese sort
+874        11   Thai. 8-bit		not implemented
+932         9   Japanese. Shift JIS	not implemented. Note id1=9 used by 1256
+936         5   Simplified Chinese	not implemented
+949        10   Korean. Unified Hangui	not implemented
 
-65001      19   4
-0          0    0
+65001      19   Unicode sort
+0          0    ASCII 7-bit sort
Index: resources/sort/cp0.txt
===================================================================
--- resources/sort/cp0.txt	(revision 4915)
+++ resources/sort/cp0.txt	(working copy)
@@ -1,9 +1,11 @@
 codepage 0
 id1 0
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "ASCII 7-bit sort"
 
 characters
+
 =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f,0001,0002,0003,0004,0005,0006,0007
  < 0009
  < 000a
@@ -32,7 +34,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < `
  < ^
@@ -79,3 +81,5 @@
  < x,X
  < y,Y
  < z,Z
+
+# ends
Index: resources/sort/cp1250.txt
===================================================================
--- resources/sort/cp1250.txt	(revision 4915)
+++ resources/sort/cp1250.txt	(working copy)
@@ -1,9 +1,11 @@
 codepage 1250
 id1 12
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Central European sort"
 
 characters
+
 =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007
  < 0009
  < 000a
@@ -45,7 +47,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -120,3 +122,5 @@
 expand ˛ to  § 0020
 expand ß to  s s
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp1251.txt
===================================================================
--- resources/sort/cp1251.txt	(revision 4915)
+++ resources/sort/cp1251.txt	(working copy)
@@ -1,9 +1,11 @@
 codepage 1251
 id1 8
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Cyrillic sort"
 
 characters
+
 =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007
  < 0009
  < 000a
@@ -45,7 +47,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -152,7 +154,8 @@
  < э,Э
  < ю,Ю
  < я,Я
-
 expand … to  . . .
 expand № to  N o
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp1253.txt
===================================================================
--- resources/sort/cp1253.txt	(revision 4915)
+++ resources/sort/cp1253.txt	(working copy)
@@ -1,6 +1,7 @@
 codepage 1253
 id1 13
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Greek sort"
 
 characters
@@ -47,7 +48,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -140,3 +141,5 @@
 expand … to  . . .
 expand ½ to  1 / 2
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp1254.txt
===================================================================
--- resources/sort/cp1254.txt	(revision 4915)
+++ resources/sort/cp1254.txt	(working copy)
@@ -1,10 +1,12 @@
 codepage 1254
 id1 14
-id2 1
+# 12-Oct-2023 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Turkish sort"
 
 characters
-= 0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007
+
+=0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007
  < 0009
  < 000a
  < 000b
@@ -47,7 +49,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -127,3 +129,5 @@
 expand ¾ to  3 / 4
 expand ß to  s s
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp1255.txt
===================================================================
--- resources/sort/cp1255.txt	(revision 4915)
+++ resources/sort/cp1255.txt	(working copy)
@@ -1,6 +1,7 @@
 codepage 1255
 id1 15
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Hebrew sort"
 
 characters
@@ -49,7 +50,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -157,3 +158,5 @@
 expand װ to  ו ו
 expand ױ to  ו י
 expand ײ to  י י
+
+# ends
Index: resources/sort/cp1256.txt
===================================================================
--- resources/sort/cp1256.txt	(revision 4915)
+++ resources/sort/cp1256.txt	(working copy)
@@ -1,176 +1,176 @@
-
 codepage 1256
 id1 9
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Arabic sort"
 
 characters
 
 =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=200c=200d=00ad=ـ=200e=200f,0001,0002,0003,0004,0005,0006,0007 ; 064b ; 064c ; 064d ; 064e ; 064f ; 0650 ; 0651 ; 0652
-< 0009
-< 000a
-< 000b
-< 000c
-< 000d
-< 0020,00a0
-< _
-< -
-< –
-< —
-< 002c
-< ،
-< 003b
-< ؛
-< :
-< !
-< ?
-< ؟
-< .
-< ·
-< '
-< ‘
-< ’
-< ‚
-< ‹
-< ›
-< "
-< “
-< ”
-< „
-< «
-< »
-< (
-< )
-< [
-< ]
-< {
-< }
-< @
-< *
-< /
-< \
-< &
-< #
-< %
-< ‰
-< †
-< ‡
-< •
-< `
-< ´
-< ^
-< ¯
-< ¨
-< ¸
-< §
-< ¶
-< ©
-< ®
-< ˆ
-< °
-< +
-< ±
-< ÷
-< ×
-< 003c
-< 003d
-< >
-< ¬
-< |
-< ¦
-< ~
-< ¤
-< ¢
-< $
-< £
-< ¥
-< €
-< 0
-< 1,¹
-< 2,²
-< 3,³
-< 4
-< 5
-< 6
-< 7
-< 8
-< 9
-< a,A ; à ; â
-< b,B
-< c,C ; ç
-< d,D
-< e,E ; é ; è ; ê ; ë
-< f,F
-< ƒ
-< g,G
-< h,H
-< i,I ; î ; ï
-< j,J
-< k,K
-< l,L
-< m,M
-< n,N
-< o,O ; ô
-< p,P
-< q,Q
-< r,R
-< s,S
-< t,T
-< u,U ; ù ; û ; ü
-< v,V
-< w,W
-< x,X
-< y,Y
-< z,Z
-< µ
-< ء
-< آ
-< أ
-< ؤ
-< إ
-< ئ
-< ا
-< ب
-< پ
-< ة
-< ت
-< ث
-< ٹ
-< ج
-< چ
-< ح
-< خ
-< د
-< ذ
-< ڈ
-< ر
-< ز
-< ڑ
-< ژ
-< س
-< ش
-< ص
-< ض
-< ط
-< ظ
-< ع
-< غ
-< ف
-< ق
-< ك
-< ک
-< گ
-< ل
-< م
-< ن
-< ں
-< ه
-< ھ
-< ہ
-< و
-< ى
-< ي
-< ے
+ < 0009
+ < 000a
+ < 000b
+ < 000c
+ < 000d
+ < 0020,00a0
+ < _
+ < -
+ < –
+ < —
+ < 002c
+ < ،
+ < 003b
+ < ؛
+ < :
+ < !
+ < ?
+ < ؟
+ < .
+ < ·
+ < '
+ < ‘
+ < ’
+ < ‚
+ < ‹
+ < ›
+ < "
+ < “
+ < ”
+ < „
+ < «
+ < »
+ < (
+ < )
+ < [
+ < ]
+ < {
+ < }
+ < @
+ < *
+ < /
+ < \
+ < &
+ < 0023
+ < %
+ < ‰
+ < †
+ < ‡
+ < •
+ < `
+ < ´
+ < ^
+ < ¯
+ < ¨
+ < ¸
+ < §
+ < ¶
+ < ©
+ < ®
+ < ˆ
+ < °
+ < +
+ < ±
+ < ÷
+ < ×
+ < 003c
+ < 003d
+ < >
+ < ¬
+ < |
+ < ¦
+ < ~
+ < ¤
+ < ¢
+ < $
+ < £
+ < ¥
+ < €
+ < 0
+ < 1,¹
+ < 2,²
+ < 3,³
+ < 4
+ < 5
+ < 6
+ < 7
+ < 8
+ < 9
+ < a,A ; à ; â
+ < b,B
+ < c,C ; ç
+ < d,D
+ < e,E ; é ; è ; ê ; ë
+ < f,F
+ < ƒ
+ < g,G
+ < h,H
+ < i,I ; î ; ï
+ < j,J
+ < k,K
+ < l,L
+ < m,M
+ < n,N
+ < o,O ; ô
+ < p,P
+ < q,Q
+ < r,R
+ < s,S
+ < t,T
+ < u,U ; ù ; û ; ü
+ < v,V
+ < w,W
+ < x,X
+ < y,Y
+ < z,Z
+ < µ
+ < ء
+ < آ
+ < أ
+ < ؤ
+ < إ
+ < ئ
+ < ا
+ < ب
+ < پ
+ < ة
+ < ت
+ < ث
+ < ٹ
+ < ج
+ < چ
+ < ح
+ < خ
+ < د
+ < ذ
+ < ڈ
+ < ر
+ < ز
+ < ڑ
+ < ژ
+ < س
+ < ش
+ < ص
+ < ض
+ < ط
+ < ظ
+ < ع
+ < غ
+ < ف
+ < ق
+ < ك
+ < ک
+ < گ
+ < ل
+ < م
+ < ن
+ < ں
+ < ه
+ < ھ
+ < ہ
+ < و
+ < ى
+ < ي
+ < ے
 
 expand … to  . . .
 expand ¼ to  1 / 4
@@ -179,3 +179,5 @@
 expand œ to  o e
 expand Œ to  O E
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp1257.txt
===================================================================
--- resources/sort/cp1257.txt	(revision 4915)
+++ resources/sort/cp1257.txt	(working copy)
@@ -1,6 +1,7 @@
 codepage 1257
 id1 17
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Latin Baltic sort"
 
 characters
@@ -46,7 +47,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -127,3 +128,5 @@
 expand Æ to  A E
 expand ß to  s s
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp1258.txt
===================================================================
--- resources/sort/cp1258.txt	(revision 4915)
+++ resources/sort/cp1258.txt	(working copy)
@@ -1,6 +1,7 @@
 codepage 1258
 id1 18
-id2 1
+# 10-Jan-2022 Increment id2/version. Fix '#' to 0023
+id2 2
 description "Vietnamese sort"
 
 characters
@@ -48,7 +49,7 @@
  < /
  < \
  < &
- < #
+ < 0023
  < %
  < ‰
  < †
@@ -132,3 +133,5 @@
 expand Œ to  O E
 expand ß to  s s
 expand ™ to  T M
+
+# ends
Index: resources/sort/cp65001.txt
===================================================================
--- resources/sort/cp65001.txt	(revision 4915)
+++ resources/sort/cp65001.txt	(working copy)
@@ -1,3 +1,7 @@
+# use extra/src/uk/me/parabola/util/CollationRules.java to generate some of the tables.
+# This uses https://www.unicode.org/Public/UCA/latest/allkeys.txt
+# see https://www.mkgmap.org.uk/pipermail/mkgmap-dev/2021q4/033096.html
+
 codepage 65001
 id1 19
 id2 4
@@ -11133,3 +11137,5 @@
 expand ㍕ to れ む
 expand ㍖ to れ ん と こ ん
 expand ㍗ to ゎ っ と
+
+# ends
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to