Hi Gerd Last year it was reported that string ordering with the '#' character was incorrect. This was because, in the sort/cp*.txt files, the relevant line with the '#' was taken as a comment.
I had a patch that fixed all the files, but it also attempted to do more with ß/ss and dipthongs. I've done another patch that doesn't have any contentious changes, just fixes the #, makes the layout consistent between the files, increments the version/id2 values and slight improvements to the documentation. Ticker On Tue, 2022-01-11 at 14:00 +0000, Gerd Petermann wrote: > > Hi Ticker, > > > > if you don't mind I'd like to postpone this patch until the active branches > > are merged into > > trunk. > > > > Gerd > > > > > > ________________________________________ > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag von > > Ticker Berkin > > <rwb-mkg...@jagit.co.uk> > > Gesendet: Dienstag, 11. Januar 2022 11:25 > > An: Development list for mkgmap > > Betreff: Re: [mkgmap-dev] Fix and augment sort definitions > > > > Hi Gerd > > > > Yes - gmapsupp builder gives a warning if id1/id2 are not consistent in > > all the .img files. It is just a warning and gmapsupp is built anyway > > and I think the warning can be ignored. gmapi doesn't notice. > > > > Almost all of the significant sorting where the Garmin device... needs > > to know the sort details happens in Mdr, so this isn't a problem. > > > > Other uses are mostly for de-duping/efficient processing, so these > > shouldn't matter either. > > > > However the LBL file does hold id1/id2 and many sections (Countries, > > Regions, Cities, Zips, POIs) are sorted so the effect here is unknown. > > > > If using --latin2 / 1252, the only change in ordering is around AE/OE > > dipthongs. > > > > Within the same commit or build as sortResource_v2, the attached > > sortMashExp.patch should be applied, as it effects the binary SRT file > > and I don't want to increment all the id2's again. This patch changes > > the sort.expand TERTIARY mashing from 2 to 3, which is slightly more > > consistent with the Garmin SRT binaries I've seen and allows SrtDisplay > > to show expansions with what looks like a meaningful case. > > > > Ticker > > > > On Tue, 2022-01-11 at 06:31 +0000, Gerd Petermann wrote: > > > > Hi Ticker, > > > > > > > > didn't try it: Will mkgmap complain when building an indexed > > > > gmapi/gmapsupp > > > > where some tiles where freshly compiled with the new version and > > > > others with > > > > an older (like Felix and Carlos do)? > > > > > > > > Gerd > > > > > > > > ________________________________________ > > > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag > > > > von Ticker Berkin <rwb-mkg...@jagit.co.uk> > > > > Gesendet: Montag, 10. Januar 2022 12:04 > > > > An: Development list for mkgmap > > > > Betreff: Re: [mkgmap-dev] Fix and augment sort definitions > > > > > > > > Hi Gerd > > > > > > > > What I meant was that keyboards/devices don't normally have ways of > > > > entering the single chars "…", "¼", "½", "¾", "™". > > > > > > > > Names with these might be presented by Garmin software after some > > > > initial chars have been entered and you can then select the complete > > > > name that contains these chars. > > > > > > > > I didn't see a good reason to remove the expand for these and find > > > > some > > > > arbitrary sort PRIMARY for them. No one has complained about them. > > > > Also > > > > cp65001 had over 1000 expands and I really don't want to start > > > > touching > > > > these. > > > > > > > > Ticker > > > > > > > > > > > > On Mon, 2022-01-10 at 10:29 +0000, Gerd Petermann wrote: > > > > > > Hi Ticker, > > > > > > > > > > > > I've committed displaySrt_v2.patch . > > > > > > > > > > > > I don't fully understand the comment > > > > > > "Leave the above because no method of inputting them anyway and > > > > > > unlikely at start of names." > > > > > > > > > > > > It is possible to enter these characters in MapSource and I think > > > > > > MapSource uses MDR12 > > > > > > when you type only a few characters for the name of a POI and don't > > > > > > pick up an entry from the list. > > > > > > > > > > > > Gerd > > > > > > > > > > > > ________________________________________ > > > > > > Von: mkgmap-dev <mkgmap-dev-boun...@lists.mkgmap.org.uk> im Auftrag > > > > > > von > > > > > > Ticker Berkin <rwb-mkg...@jagit.co.uk> > > > > > > Gesendet: Montag, 10. Januar 2022 11:20 > > > > > > An: Development list for mkgmap > > > > > > Betreff: Re: [mkgmap-dev] Fix and augment sort definitions > > > > > > > > > > > > Hi Gerd > > > > > > > > > > > > I tried various approaches to fixing "Find" when the fixed length > > > > > > Mdr17 > > > > > > (maybe also Mdr12) prefix contains sort.expand chars and couldn't > > > > > > make > > > > > > it work. I could documents these attempts in Sort.java if you feel > > > > > > this > > > > > > is worthwhile. > > > > > > > > > > > > New patch attached that, for cp1252, leaves "ß" as its own PRIMARY > > > > > > after "s". Moved æ,Æ etc to be PRIMARIES on the grounds that their > > > > > > behaviour will be the same as "ß". Made cp1254 consistent as it had > > > > > > similar partial fixes. > > > > > > > > > > > > The main reason for the patch is to fix all the other sort/cp*.txt > > > > > > files that had line " > #" which was taken as a comment, resulting > > > > > > in > > > > > > "#" being ignored in collation. > > > > > > > > > > > > With the Display patch (sent previously, but also attached here), > > > > > > it > > > > > > can reproduce the resource/sort file from the binary SRT section. > > > > > > > > > > > > Ticker > > > > > > > > > > > > _______________________________________________ > > > > > > mkgmap-dev mailing list > > > > > > mkgmap-dev@lists.mkgmap.org.uk > > > > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > > > > > > > > > > > > _______________________________________________ > > > > mkgmap-dev mailing list > > > > mkgmap-dev@lists.mkgmap.org.uk > > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > > > > _______________________________________________ > > > > mkgmap-dev mailing list > > > > mkgmap-dev@lists.mkgmap.org.uk > > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > > > > _______________________________________________ > > mkgmap-dev mailing list > > mkgmap-dev@lists.mkgmap.org.uk > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
Index: resources/sort/README =================================================================== --- resources/sort/README (revision 4915) +++ resources/sort/README (working copy) @@ -35,22 +35,24 @@ I believe that these are arbitary identifiers. Here is a registry of values we are using. If you make a variation on a code-page sort-order then give it a different id2 value. +It is believed that having sorts with the same id1/id2 but different data loaded +on the same device will give unexpected results -code-page id1 id2 +code-page id1 description -1250 12 1 -1251 8 1 -1252 7 2 -1253 13 1 -1254 14 1 -1255 15 1 -1256 16 1 -1257 17 1 -1258 18 1 -874 11 1 -932 9 1 -936 5 1 -949 10 1 +1250 12 Central European sort +1251 8 Cyrillic sort +1252 7 Western European sort +1253 13 Greek sort +1254 14 Turkish sort +1255 15 Hebrew sort +1256 16?9 Arabic sort cp1256.txt has id1=9, original version of this doc said 16 +1257 17 Latin Baltic sort +1258 18 Vietnamese sort +874 11 Thai. 8-bit not implemented +932 9 Japanese. Shift JIS not implemented. Note id1=9 used by 1256 +936 5 Simplified Chinese not implemented +949 10 Korean. Unified Hangui not implemented -65001 19 4 -0 0 0 +65001 19 Unicode sort +0 0 ASCII 7-bit sort Index: resources/sort/cp0.txt =================================================================== --- resources/sort/cp0.txt (revision 4915) +++ resources/sort/cp0.txt (working copy) @@ -1,9 +1,11 @@ codepage 0 id1 0 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "ASCII 7-bit sort" characters + =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f,0001,0002,0003,0004,0005,0006,0007 < 0009 < 000a @@ -32,7 +34,7 @@ < / < \ < & - < # + < 0023 < % < ` < ^ @@ -79,3 +81,5 @@ < x,X < y,Y < z,Z + +# ends Index: resources/sort/cp1250.txt =================================================================== --- resources/sort/cp1250.txt (revision 4915) +++ resources/sort/cp1250.txt (working copy) @@ -1,9 +1,11 @@ codepage 1250 id1 12 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Central European sort" characters + =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007 < 0009 < 000a @@ -45,7 +47,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -120,3 +122,5 @@ expand ˛ to § 0020 expand ß to s s expand ™ to T M + +# ends Index: resources/sort/cp1251.txt =================================================================== --- resources/sort/cp1251.txt (revision 4915) +++ resources/sort/cp1251.txt (working copy) @@ -1,9 +1,11 @@ codepage 1251 id1 8 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Cyrillic sort" characters + =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007 < 0009 < 000a @@ -45,7 +47,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -152,7 +154,8 @@ < э,Э < ю,Ю < я,Я - expand … to . . . expand № to N o expand ™ to T M + +# ends Index: resources/sort/cp1253.txt =================================================================== --- resources/sort/cp1253.txt (revision 4915) +++ resources/sort/cp1253.txt (working copy) @@ -1,6 +1,7 @@ codepage 1253 id1 13 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Greek sort" characters @@ -47,7 +48,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -140,3 +141,5 @@ expand … to . . . expand ½ to 1 / 2 expand ™ to T M + +# ends Index: resources/sort/cp1254.txt =================================================================== --- resources/sort/cp1254.txt (revision 4915) +++ resources/sort/cp1254.txt (working copy) @@ -1,10 +1,12 @@ codepage 1254 id1 14 -id2 1 +# 12-Oct-2023 Increment id2/version. Fix '#' to 0023 +id2 2 description "Turkish sort" characters -= 0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007 + +=0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=00ad,0001,0002,0003,0004,0005,0006,0007 < 0009 < 000a < 000b @@ -47,7 +49,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -127,3 +129,5 @@ expand ¾ to 3 / 4 expand ß to s s expand ™ to T M + +# ends Index: resources/sort/cp1255.txt =================================================================== --- resources/sort/cp1255.txt (revision 4915) +++ resources/sort/cp1255.txt (working copy) @@ -1,6 +1,7 @@ codepage 1255 id1 15 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Hebrew sort" characters @@ -49,7 +50,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -157,3 +158,5 @@ expand װ to ו ו expand ױ to ו י expand ײ to י י + +# ends Index: resources/sort/cp1256.txt =================================================================== --- resources/sort/cp1256.txt (revision 4915) +++ resources/sort/cp1256.txt (working copy) @@ -1,176 +1,176 @@ - codepage 1256 id1 9 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Arabic sort" characters =0008=000e=000f=0010=0011=0012=0013=0014=0015=0016=0017=0018=0019=001a=001b=001c=001d=001e=001f=007f=200c=200d=00ad=ـ=200e=200f,0001,0002,0003,0004,0005,0006,0007 ; 064b ; 064c ; 064d ; 064e ; 064f ; 0650 ; 0651 ; 0652 -< 0009 -< 000a -< 000b -< 000c -< 000d -< 0020,00a0 -< _ -< - -< – -< — -< 002c -< ، -< 003b -< ؛ -< : -< ! -< ? -< ؟ -< . -< · -< ' -< ‘ -< ’ -< ‚ -< ‹ -< › -< " -< “ -< ” -< „ -< « -< » -< ( -< ) -< [ -< ] -< { -< } -< @ -< * -< / -< \ -< & -< # -< % -< ‰ -< † -< ‡ -< • -< ` -< ´ -< ^ -< ¯ -< ¨ -< ¸ -< § -< ¶ -< © -< ® -< ˆ -< ° -< + -< ± -< ÷ -< × -< 003c -< 003d -< > -< ¬ -< | -< ¦ -< ~ -< ¤ -< ¢ -< $ -< £ -< ¥ -< € -< 0 -< 1,¹ -< 2,² -< 3,³ -< 4 -< 5 -< 6 -< 7 -< 8 -< 9 -< a,A ; à ; â -< b,B -< c,C ; ç -< d,D -< e,E ; é ; è ; ê ; ë -< f,F -< ƒ -< g,G -< h,H -< i,I ; î ; ï -< j,J -< k,K -< l,L -< m,M -< n,N -< o,O ; ô -< p,P -< q,Q -< r,R -< s,S -< t,T -< u,U ; ù ; û ; ü -< v,V -< w,W -< x,X -< y,Y -< z,Z -< µ -< ء -< آ -< أ -< ؤ -< إ -< ئ -< ا -< ب -< پ -< ة -< ت -< ث -< ٹ -< ج -< چ -< ح -< خ -< د -< ذ -< ڈ -< ر -< ز -< ڑ -< ژ -< س -< ش -< ص -< ض -< ط -< ظ -< ع -< غ -< ف -< ق -< ك -< ک -< گ -< ل -< م -< ن -< ں -< ه -< ھ -< ہ -< و -< ى -< ي -< ے + < 0009 + < 000a + < 000b + < 000c + < 000d + < 0020,00a0 + < _ + < - + < – + < — + < 002c + < ، + < 003b + < ؛ + < : + < ! + < ? + < ؟ + < . + < · + < ' + < ‘ + < ’ + < ‚ + < ‹ + < › + < " + < “ + < ” + < „ + < « + < » + < ( + < ) + < [ + < ] + < { + < } + < @ + < * + < / + < \ + < & + < 0023 + < % + < ‰ + < † + < ‡ + < • + < ` + < ´ + < ^ + < ¯ + < ¨ + < ¸ + < § + < ¶ + < © + < ® + < ˆ + < ° + < + + < ± + < ÷ + < × + < 003c + < 003d + < > + < ¬ + < | + < ¦ + < ~ + < ¤ + < ¢ + < $ + < £ + < ¥ + < € + < 0 + < 1,¹ + < 2,² + < 3,³ + < 4 + < 5 + < 6 + < 7 + < 8 + < 9 + < a,A ; à ; â + < b,B + < c,C ; ç + < d,D + < e,E ; é ; è ; ê ; ë + < f,F + < ƒ + < g,G + < h,H + < i,I ; î ; ï + < j,J + < k,K + < l,L + < m,M + < n,N + < o,O ; ô + < p,P + < q,Q + < r,R + < s,S + < t,T + < u,U ; ù ; û ; ü + < v,V + < w,W + < x,X + < y,Y + < z,Z + < µ + < ء + < آ + < أ + < ؤ + < إ + < ئ + < ا + < ب + < پ + < ة + < ت + < ث + < ٹ + < ج + < چ + < ح + < خ + < د + < ذ + < ڈ + < ر + < ز + < ڑ + < ژ + < س + < ش + < ص + < ض + < ط + < ظ + < ع + < غ + < ف + < ق + < ك + < ک + < گ + < ل + < م + < ن + < ں + < ه + < ھ + < ہ + < و + < ى + < ي + < ے expand … to . . . expand ¼ to 1 / 4 @@ -179,3 +179,5 @@ expand œ to o e expand Œ to O E expand ™ to T M + +# ends Index: resources/sort/cp1257.txt =================================================================== --- resources/sort/cp1257.txt (revision 4915) +++ resources/sort/cp1257.txt (working copy) @@ -1,6 +1,7 @@ codepage 1257 id1 17 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Latin Baltic sort" characters @@ -46,7 +47,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -127,3 +128,5 @@ expand Æ to A E expand ß to s s expand ™ to T M + +# ends Index: resources/sort/cp1258.txt =================================================================== --- resources/sort/cp1258.txt (revision 4915) +++ resources/sort/cp1258.txt (working copy) @@ -1,6 +1,7 @@ codepage 1258 id1 18 -id2 1 +# 10-Jan-2022 Increment id2/version. Fix '#' to 0023 +id2 2 description "Vietnamese sort" characters @@ -48,7 +49,7 @@ < / < \ < & - < # + < 0023 < % < ‰ < † @@ -132,3 +133,5 @@ expand Œ to O E expand ß to s s expand ™ to T M + +# ends Index: resources/sort/cp65001.txt =================================================================== --- resources/sort/cp65001.txt (revision 4915) +++ resources/sort/cp65001.txt (working copy) @@ -1,3 +1,7 @@ +# use extra/src/uk/me/parabola/util/CollationRules.java to generate some of the tables. +# This uses https://www.unicode.org/Public/UCA/latest/allkeys.txt +# see https://www.mkgmap.org.uk/pipermail/mkgmap-dev/2021q4/033096.html + codepage 65001 id1 19 id2 4 @@ -11133,3 +11137,5 @@ expand ㍕ to れ む expand ㍖ to れ ん と こ ん expand ㍗ to ゎ っ と + +# ends
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev