Re: [NTG-context] two buglets

2010-10-06 Thread Hans Hagen

On 5-10-2010 11:55, Philipp Gesang wrote:


I assume by “shapes” you mean the base symbol (all diacritics
stripped).


indeed (and we might need to add/patch a few more shcodes to 
char-def.lua if needed)


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-05 Thread Philipp Gesang
On 2010-10-03 17:43:21, Thomas A. Schmitz wrote:
 OK, I'll write something for German and English, but the thing
 is that we need more input what users expect. For mixtures with
 foreign languages, there might not be generally accepted rules at
 all, so people will define something on an ad-hoc basis.

Hi Thomas and others,

technically speaking the problem is solved by ISO 14651.[1]

In praxi multilingual sorting depends on local rules, of
which “One index per script|language.” seems to be the most
common.

Some time ago I made an lpeg from the bnf in [1]. It matches the
collation rules from [2], but as I couldn’t figure out how to map
them onto context’s sorting mechanism I never got around to
actually capture the information. As I won’t be having the time
to try it with the new structure of sort-lan I guess I’ll just
attach the peg grammar for anyone to use as a starting point.
Unicode collation would be great to have in context.

 transliteration. The problem with polytonic Greek is that so many
 different unicode characters need to have the same sort entry. If

Isn’t that just what the Greek rules in sort-lan.lua do? If not
then it would be a bug.

startsnippet·

definitions[gr] = {
entries = {
[α] = α, [ά] = α, [ὰ] = α, [ᾶ] = α, [ᾳ] = α,
[ἀ] = α, [ἁ] = α, [ἄ] = α, [ἂ] = α, [ἆ] = α,
[ἁ] = α, [ἅ] = α, [ἃ] = α, [ἇ] = α, [ᾁ] = α,
[ᾴ] = α, [ᾲ] = α, [ᾷ] = α, [ᾄ] = α, [ᾂ] = α,
[ᾅ] = α, [ᾃ] = α, [ᾆ] = α, [ᾇ] = α, [β] = β,

stopsnippet··

Always nice to have a decent discussion on sorting ;)

Philipp


[1] 
http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
[2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
require lpeg

local C, Cs, Ct, P, R, S, V, match = lpeg.C, lpeg.Cs, lpeg.Ct, lpeg.P, lpeg.R, 
lpeg.S, lpeg.V, lpeg.match

local iso_parser

rules = P{
[1] = weight_table,

-- Define collation tables as sequences of lines

weight_table = Vcommon_template_table + Vtailored_table,
common_template_table = Vsimple_line^0,
tailored_table = Vtable_line^0,

-- Define the line types

simple_line = (Vsymbol_definition + Vcollating_element +
   Vweight_assignment + Vorder_end)^-1 * Vline_completion 
--/ function (first) io.write(simple: ..first) end
   ,
--table_line = Vsimple_line + Vtailoring_line,
table_line = Vtailoring_line + Vsimple_line,
tailoring_line = (Vreorder_after + Vorder_start + Vreorder_end +
  Vsection_definition + Vreorder_section_after) *
  Vline_completion --/ function (first) 
io.write(tailoring: ..first) end
  ,

-- Define the basic syntax for collation weighting

symbol_definition = Pcollating-symbol * Vspace^1 * Vsymbol_element,
symbol_element = Vsymbol-Vsymbol_range + Vsymbol_range,
symbol_range = Vsymbol * P.. * Vsymbol,
symbol = Vsimple_symbol + Vucs_symbol,
ucs_symbol = (PU  * Vone_to_eight_digit_hex_string * P) +
 (PU- * Vone_to_eight_digit_hex_string * P),
simple_symbol = P * Videntifier * P,
collating_element = Pcollating-element * Vspace^1 * Vsymbol * 
Vspace^1 *
Pfrom * Vspace^1 * Vquoted_symbol_sequence,
quoted_symbol_sequence = P'' * Vsimple_weight^1 * P'',
--weight_assignment = Vsimple_weight + Vsymbol_weight,
weight_assignment = Vsymbol_weight + Vsimple_weight,
simple_weight = Vsymbol_element + PUNDEFINED,
symbol_weight = Vsymbol_element * Vspace^1 * Vweight_list,
weight_list = Vlevel_token * (Vsemicolon * Vlevel_token)^0,
level_token = Vsymbol_group + PIGNORE,
symbol_group = Vsymbol_element + Vquoted_symbol_sequence,
order_end = Porder_end,

-- Define the tailoring syntax

reorder_after = Preorder-after * Vspace^1 * Vtarget_symbol,
target_symbol = Vsymbol,
order_start = Porder_start * Vspace^1 * Vmultiple_level_direction,
multiple_level_direction = Vdirection * (Vsemicolon * Vdirection)^0 * 
P,position^-1,
direction = Pforward + Pbackward,
reorder_end = Preorder-end,
section_definition = Vsection_definition_simple + 
Vsection_definition_list,
section_definition_simple = Psection * Vspace^1 * Vsection_identifier,
section_identifier = Videntifier,
section_definition_list = Psection * Vspace^1 * Vsection_identifier * 
Vspace^1 * Vsymbol_list,
symbol_list = Vsymbol_element * (Vsemicolon * Vsymbol_element)^0,
reorder_section_after = Preorder-section-after * Vspace^1 * 
Vsection_identifier * Vspace^1 * Vtarget_symbol,

-- Define low-level tokens used by the rest of the syntax

identifier = (Vletter + Vdigit) * Vid_part^0,
id_part = Vletter + Vdigit + S-_,
line_completion = Vspace^0 

Re: [NTG-context] two buglets

2010-10-05 Thread Hans Hagen

On 5-10-2010 2:15, Philipp Gesang wrote:


[1] 
http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_14651_2007(E).zip
[2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt


I'll have a look at it when I've time for it (I didn't know that doc; 
it's more fun figuring it out oneself anyway).


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-05 Thread Thomas A. Schmitz

On Oct 5, 2010, at 2:15 PM, Philipp Gesang wrote:
 
 Hi Thomas and others,
 
 technically speaking the problem is solved by ISO 14651.[1]
 
 In praxi multilingual sorting depends on local rules, of
 which “One index per script|language.” seems to be the most
 common.

Yes, that's what I was trying to say. In practice, hardly anyone will want an 
individual index for Spanish  if they have just two Spanish words in an English 
book. And someone (me) might say that they want three Greek terms in their 
German index at logical places. 

 
 Some time ago I made an lpeg from the bnf in [1]. It matches the
 collation rules from [2], but as I couldn’t figure out how to map
 them onto context’s sorting mechanism I never got around to
 actually capture the information. As I won’t be having the time
 to try it with the new structure of sort-lan I guess I’ll just
 attach the peg grammar for anyone to use as a starting point.
 Unicode collation would be great to have in context.
 
 transliteration. The problem with polytonic Greek is that so many
 different unicode characters need to have the same sort entry. If
 
 Isn’t that just what the Greek rules in sort-lan.lua do? If not
 then it would be a bug.
 
Oh yes, you're right, I missed that. Thanks for pointing that out!

Thomas

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-05 Thread Philipp Gesang
On 2010-10-05 15:29:38, Thomas A. Schmitz wrote:
And someone (me) might
 say that they want three Greek terms in their German index at
 logical places. 

Try the definitions in the attachment. For three words only they
will be fine. But if the count increases you will soon run into a
situation where it’s not easy to determine where those “logical
places” are. E.g. would you want the letter “υ” under latin “y”
or “u”? Phonologically (might depend on your stance on historical
phonology -- could be a minefield) you might find it reasonable
to treat “ου” as “u” (or “ū” if that matters), but your audience
might expect it at the graphetic location, latin “ou”, instead.
As you can see in the example, when mapping both omega and
omicron onto Latin “o” the result is that “χρῶμα” will appear
before “Χρόνος”, which looks a bit odd.

This ad-hoc solution is troublesome when two words (a German and
a Greek one) occupy the same spot in the search order, like
“Polyneikes” and “Πολυνείκης”. My index output is:

Polyneikes 2
Πολυνείκης 2
Polyneikes 3
Πολυνείκης 3

which should rather be

Polyneikes 2, 3
Πολυνείκης 2, 3

I guess there is some testing going on in order to determine
whether to proceed with the current entry or switch to the next
one. The position is the same, however the comparison with the
last item fails and a new one is created instead. (Only
guessing.)

If you run into this problem you might have to ask Hans for
advice.

Hth,

Philipp


-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

\startluacode
sorters.definitions[de-gr] = {
method  = before,
replacements = {
-- German
{ ä, 'ae' }, { Ä, 'Ae' },
{ ö, 'oe' }, { Ö, 'Oe' },
{ ü, 'ue' }, { Ü, 'Ue' },
{ ß, 'ss' },

-- Greek
{ α, a  }, { ά, a  }, { ὰ, a  }, { ᾶ, a  }, { ᾳ, a  },
{ ἀ, a  }, { ἁ, a  }, { ἄ, a  }, { ἂ, a  }, { ἆ, a  },
{ ἁ, a  }, { ἅ, a  }, { ἃ, a  }, { ἇ, a  }, { ᾁ, a  },
{ ᾴ, a  }, { ᾲ, a  }, { ᾷ, a  }, { ᾄ, a  }, { ᾂ, a  },
{ ᾅ, a  }, { ᾃ, a  }, { ᾆ, a  }, { ᾇ, a  }, { β, b  },
{ γ, g  }, { δ, d  }, { ε, e  }, { έ, e  }, { ὲ, e  },
{ ἐ, e  }, { ἔ, e  }, { ἒ, e  }, { ἑ, e  }, { ἕ, e  },
{ ἓ, e  }, { ζ, z  }, { η, e  }, { η, e  }, { ή, e  },
{ ὴ, e  }, { ῆ, e  }, { ῃ, e  }, { ἠ, e  }, { ἤ, e  },
{ ἢ, e  }, { ἦ, e  }, { ᾐ, e  }, { ἡ, e  }, { ἥ, e  },
{ ἣ, e  }, { ἧ, e  }, { ᾑ, e  }, { ῄ, e  }, { ῂ, e  },
{ ῇ, e  }, { ᾔ, e  }, { ᾒ, e  }, { ᾕ, e  }, { ᾓ, e  },
{ ᾖ, e  }, { ᾗ, e  }, { θ, th }, { ι, i  }, { ί, i  },
{ ὶ, i  }, { ῖ, i  }, { ἰ, i  }, { ἴ, i  }, { ἲ, i  },
{ ἶ, i  }, { ἱ, i  }, { ἵ, i  }, { ἳ, i  }, { ἷ, i  },
{ ϊ, i  }, { ΐ, i  }, { ῒ, i  }, { ῗ, i  }, { κ, k  },
{ λ, l  }, { μ, m  }, { ν, n  }, { ξ, x  }, { ο, o  },
{ ό, o  }, { ὸ, o  }, { ὀ, o  }, { ὄ, o  }, { ὂ, o  },
{ ὁ, o  }, { ὅ, o  }, { ὃ, o  }, { π, p  }, { ρ, r  },
{ ῤ, r  }, { ῥ, r  }, { σ, s  }, { ς, s  }, { τ, t  },
{ υ, y  }, { ύ, y  }, { ὺ, y  }, { ῦ, y  }, { ὐ, y  },
{ ὔ, y  }, { ὒ, y  }, { ὖ, y  }, { ὑ, y  }, { ὕ, y  },
{ ὓ, y  }, { ὗ, y  }, { ϋ, y  }, { ΰ, y  }, { ῢ, y  },
{ ῧ, y  }, { φ, ph }, { χ, ch }, { ψ, ps }, { ω, o  },
{ ώ, o  }, { ὼ, o  }, { ῶ, o  }, { ῳ, o  }, { ὠ, o  },
{ ὤ, o  }, { ὢ, o  }, { ὦ, o  }, { ᾠ, o  }, { ὡ, o  },
{ ὥ, o  }, { ὣ, o  }, { ὧ, o  }, { ᾡ, o  }, { ῴ, o  },
{ ῲ, o  }, { ῷ, o  }, { ᾤ, o  }, { ᾢ, o  }, { ᾥ, o  },
{ ᾣ, o  }, { ᾦ, o  }, { ᾧ, o  },

{ Α, A  }, { Ά, A  }, { Ὰ, A  }, { ᾼ, A  }, { Ἀ, A  },
{ Ἁ, A  }, { Ἄ, A  }, { Ἂ, A  }, { Ἆ, A  }, { Ἁ, A  },
{ Ἅ, A  }, { Ἃ, A  }, { Ἇ, A  }, { ᾉ, A  }, { ᾌ, A  },
{ ᾊ, A  }, { ᾍ, A  }, { ᾋ, A  }, { ᾎ, A  }, { ᾏ, A  },
{ Β, B  }, { Γ, G  }, { Δ, D  }, { Ε, E  }, { Έ, E  },
{ Ὲ, E  }, { Ἐ, E  }, { Ἔ, E  }, { Ἒ, E  }, { Ἑ, E  },
{ Ἕ, E  }, { Ἓ, E  }, { Ζ, Z  }, { Η, E  }, { Η, E  },
{ Ή, E  }, { Ὴ, E  }, { ῌ, E  }, { Ἠ, E  }, { Ἤ, E  },
{ Ἢ, E  }, { Ἦ, E  }, { ᾘ, E  }, { Ἡ, E  }, { Ἥ, E  },
{ Ἣ, E  }, { Ἧ, E  }, { ᾙ, E  }, { ᾜ, E  }, { ᾚ, E  },
{ ᾝ, E  }, { ᾛ, E  }, { ᾞ, E  }, { ᾟ, E  }, { Θ, Th },
{ Ι, I  }, { Ί, I  }, { Ὶ, I  }, { Ἰ, I  }, { Ἴ, I  },
{ Ἲ, I  }, { Ἶ, I  }, { Ἱ, I  }, { Ἵ, I  }, { Ἳ, I  },
{ Ἷ, I  }, { Ϊ, I  }, { Κ, K  }, { Λ, L  }, { Μ, M  },
{ Ν, N  }, { Ξ, X  }, { Ο, O  }, { Ό, O  }, { Ὸ, O  },
{ Ὀ, O  }, { Ὄ, O  }, { Ὂ, O  }, { Ὁ, O  }, { Ὅ, O  },
{ Ὃ, O  }, { Π, P  }, { Ρ, R  }, { Ῥ, R  }, { Σ, S  },
{ Σ, S  }, { Τ, T  }, { Υ, Y  }, { Ύ, Y  }, { Ὺ, Y  },
{ Ὑ, Y  }, { Ὕ, Y  }, { Ὓ, Y  }, { Ὗ, Y  }, { Ϋ, Y  },
{ Φ, Ph }, { Χ, Ch }, { Ψ, Ps }, { Ω, O  }, { Ώ, O  },
{ Ὼ, O  }, { ῼ, O  }, { Ὠ, O  }, { Ὤ, O  }, { Ὢ, O  },

Re: [NTG-context] two buglets

2010-10-05 Thread Hans Hagen

On 5-10-2010 11:17, Philipp Gesang wrote:


I guess there is some testing going on in order to determine
whether to proceed with the current entry or switch to the next
one. The position is the same, however the comparison with the
last item fails and a new one is created instead. (Only
guessing.)


it's a sequence of tests per comparison, like

Polyneikes
polyneikes % lowercased
polyneikes % shapes
Polyneikes % unicode

Πολυνείκης
Πολυνείκης % lowercased
polyneikes % shapes
Πολυνείκης % unicode

casing and shapes depends on the mapping vectors and the order can be 
influenced, you can see this in action with


\enabletrackers[sorters.tests]

Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-03 Thread Thomas A. Schmitz
Hi all, Hans,


On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:

 1. index sorts uppercase letters after lowercase letters. Minimal example:
 
 \starttext
 
 \index{Aardvark}Aardvark
 
 \index{azygous}azygous
 
 \page
 
 \setupregister[index][n=1]
 \placeregister[index]
 
 \stoptext
 
 I would expect azygous to follow Aardvark, but it is sorted before.
 
 
 are you sure that that's the convention for english? it's easy to change it 
 ...
 
 \startluacode
 sorters.mappings['en'] = {
[a] =  2, [b] =  4, [c] =  6, [d] =  8, [e] = 10,
[f] = 12, [g] = 14, [h] = 16, [i] = 18, [j] = 20,
[k] = 22, [l] = 24, [m] = 26, [n] = 28, [o] = 30,
[p] = 32, [q] = 34, [r] = 36, [s] = 38, [t] = 40,
[u] = 42, [v] = 44, [w] = 46, [x] = 48, [y] = 50,
[z] = 52,
[A] =  1, [B] =  3, [C] =  5, [D] =  7, [E] =  9,
[F] = 11, [G] = 13, [H] = 15, [I] = 17, [J] = 19,
[K] = 21, [L] = 23, [M] = 25, [N] = 27, [O] = 29,
[P] = 31, [Q] = 33, [R] = 35, [S] = 37, [T] = 39,
[U] = 41, [V] = 43, [W] = 45, [X] = 47, [Y] = 49,
[Z] = 51,
 }
 \stopluacode
 
 \starttext
\index{Aardvark}Aardvark \par
\index{azygous}azygous
\placeregister[index][n=1]
 \stoptext
 

we had this pretty old thread about sorting in indexes. AFAICS, the latest beta 
defaults to cases-sensitive sorting. Two quick questions:

1. Is there a setup command that will make index sorting case-insensitive? The 
code above doesn't work anymore, so maybe you made it user-configurable now?

2. Is it really a good idea to make case-sensitive sorting the default in 
English? I can't remember seeing a single academic book in English that has 
this sort of index sorting.

All best

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-03 Thread Hans Hagen

On 3-10-2010 10:24, Thomas A. Schmitz wrote:

Hi all, Hans,


On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:


1. index sorts uppercase letters after lowercase letters. Minimal example:

\starttext

\index{Aardvark}Aardvark

\index{azygous}azygous

\page

\setupregister[index][n=1]
\placeregister[index]

\stoptext

I would expect azygous to follow Aardvark, but it is sorted before.



are you sure that that's the convention for english? it's easy to change it ...

\startluacode
sorters.mappings['en'] = {
[a] =  2, [b] =  4, [c] =  6, [d] =  8, [e] = 10,
[f] = 12, [g] = 14, [h] = 16, [i] = 18, [j] = 20,
[k] = 22, [l] = 24, [m] = 26, [n] = 28, [o] = 30,
[p] = 32, [q] = 34, [r] = 36, [s] = 38, [t] = 40,
[u] = 42, [v] = 44, [w] = 46, [x] = 48, [y] = 50,
[z] = 52,
[A] =  1, [B] =  3, [C] =  5, [D] =  7, [E] =  9,
[F] = 11, [G] = 13, [H] = 15, [I] = 17, [J] = 19,
[K] = 21, [L] = 23, [M] = 25, [N] = 27, [O] = 29,
[P] = 31, [Q] = 33, [R] = 35, [S] = 37, [T] = 39,
[U] = 41, [V] = 43, [W] = 45, [X] = 47, [Y] = 49,
[Z] = 51,
}
\stopluacode

\starttext
\index{Aardvark}Aardvark \par
\index{azygous}azygous
\placeregister[index][n=1]
\stoptext



we had this pretty old thread about sorting in indexes. AFAICS, the latest beta 
defaults to cases-sensitive sorting. Two quick questions:

1. Is there a setup command that will make index sorting case-insensitive? The 
code above doesn't work anymore, so maybe you made it user-configurable now?


indeed, and in a nice obscure way ...

\setuplayout[topspace=1cm,height=middle]

\setupbodyfont[11pt]

\starttext

\def\Test#1%

{\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}

wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank

\startcolumns[n=3]
\Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
\Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
\Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
\stopcolumns

\page

wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank

\startcolumns[n=3]
\Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
\Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
\Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
\stopcolumns

\page

\dorecurse {2} {
   \page \recurselevel:
\index{oá}  \index{öb}  \index{Oč}  \index{Öď}
\index{oo}  \index{öo}  \index{Oo}  \index{Öo}
\index{Öq}  \index{öř}  \index{Oš}  \index{oů}
   done
}

\stoptext


2. Is it really a good idea to make case-sensitive sorting the default in 
English? I can't remember seeing a single academic book in English that has 
this sort of index sorting.


Currently Jano and I are figuring out some details (as Jano does the 
testing with more complex multilingual indices).


I have no preferece ... we can configure each language independently 
using the method key in the entries in sort-lan.lua As I seldom consult 
an index I have no clue what to expect or default to so feel free to 
tell me what the defaults should be. We now have predefined:


local predefinedmethods = {
[variables.before] = mm,mc,uc,
[variables.after]  = pm,mc,uc,
[variables.first]  = pc,mm,uc,
[variables.last]   = mc,mm,uc,
}

Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-03 Thread Thomas A. Schmitz

On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:

 indeed, and in a nice obscure way ...
 
 \setuplayout[topspace=1cm,height=middle]
 
 \setupbodyfont[11pt]
 
 \starttext
 
 \def\Test#1%
 {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
 
 wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
 
 \startcolumns[n=3]
\Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
\Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
\Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
 \stopcolumns
 
 \page
 
 wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
 
 \startcolumns[n=3]
\Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
\Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
\Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
 \stopcolumns
 
 \page
 
 \dorecurse {2} {
   \page \recurselevel:
\index{oá}  \index{öb}  \index{Oč}  \index{Öď}
\index{oo}  \index{öo}  \index{Oo}  \index{Öo}
\index{Öq}  \index{öř}  \index{Oš}  \index{oů}
   done
 }
 
 \stoptext

Give me a chance to understand :-) I tried looking in sort-ini.lua, but I 
couldn't figure out what the different methods meant. What do the abbreviations 
stand for? Also, I seem to obtain the desired case-insensitive sorting with 
method=zm,pc,uc
but I also get spurious empty lines in the index. I'll try and come up with a 
minimal example.

 
 2. Is it really a good idea to make case-sensitive sorting the default in 
 English? I can't remember seeing a single academic book in English that has 
 this sort of index sorting.
 
 Currently Jano and I are figuring out some details (as Jano does the testing 
 with more complex multilingual indices).
 
 I have no preferece ... we can configure each language independently using 
 the method key in the entries in sort-lan.lua As I seldom consult an index I 
 have no clue what to expect or default to so feel free to tell me what the 
 defaults should be. We now have predefined:
 
 local predefinedmethods = {
[variables.before] = mm,mc,uc,
[variables.after]  = pm,mc,uc,
[variables.first]  = pc,mm,uc,
[variables.last]   = mc,mm,uc,
 }

Hmm, if this is easy to configure, it doesn't make much of a difference. Just 
as a default, for English and German, I would suggest having no 
case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, 
most people would sort them just like normal letters. But this is something 
that others on the list or on the wiki should express their opinion on.

THanks, and all best

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-03 Thread Hans Hagen

On 3-10-2010 12:58, Thomas A. Schmitz wrote:


On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:


indeed, and in a nice obscure way ...

\setuplayout[topspace=1cm,height=middle]

\setupbodyfont[11pt]

\starttext

\def\Test#1%
{\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}

wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank

\startcolumns[n=3]
\Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc}
\Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc}
\Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc}
\stopcolumns

\page

wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank

\startcolumns[n=3]
\Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc}
\Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc}
\Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc}
\stopcolumns

\page

\dorecurse {2} {
   \page \recurselevel:
\index{oá}  \index{öb}  \index{Oč}  \index{Öď}
\index{oo}  \index{öo}  \index{Oo}  \index{Öo}
\index{Öq}  \index{öř}  \index{Oš}  \index{oů}
   done
}

\stoptext


Give me a chance to understand :-) I tried looking in sort-ini.lua, but I 
couldn't figure out what the different methods meant. What do the abbreviations 
stand for? Also, I seem to obtain the desired case-insensitive sorting with
method=zm,pc,uc
but I also get spurious empty lines in the index. I'll try and come up with a 
minimal example.


mm zm pm : use mapping order, add -1,0, +1 to different case and use 
shape info for missing entries (similar shapes)

mc zc pc : use mapping order, add -1,0, +1 to different case
uc: unicode order

so, you define a sequence of comparisons where for instance

U   - order u +/- 1
\u - order of shape u +/- 1

etc .. a bit cryptic I admit ... some combinations give the same result 
depending on the vectors used. (Jano promissed to write up something.)


numbers are sorted in a special way

so, at some point we simplify characters and start looking at shapes and 
sort based on shapes which of course leads to clashes so in a next step 
we look at unicodes etc etc



2. Is it really a good idea to make case-sensitive sorting the default in 
English? I can't remember seeing a single academic book in English that has 
this sort of index sorting.


Currently Jano and I are figuring out some details (as Jano does the testing 
with more complex multilingual indices).

I have no preferece ... we can configure each language independently using the 
method key in the entries in sort-lan.lua As I seldom consult an index I have 
no clue what to expect or default to so feel free to tell me what the defaults 
should be. We now have predefined:

local predefinedmethods = {
[variables.before] = mm,mc,uc,
[variables.after]  = pm,mc,uc,
[variables.first]  = pc,mm,uc,
[variables.last]   = mc,mm,uc,
}


Hmm, if this is easy to configure, it doesn't make much of a difference. Just 
as a default, for English and German, I would suggest having no 
case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, 
most people would sort them just like normal letters. But this is something 
that others on the list or on the wiki should express their opinion on.


best would be to have a test file per language with in comments the 
expected order; such tests should also provide foreign entries


for instance, how would you mix german and greek in your books; we 
probably need some specialized vectors then, which is possible as the 
sorting language can be configured independent from the text language


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-10-03 Thread Thomas A. Schmitz

On Oct 3, 2010, at 5:10 PM, Hans Hagen wrote:

 
 mm zm pm : use mapping order, add -1,0, +1 to different case and use shape 
 info for missing entries (similar shapes)
 mc zc pc : use mapping order, add -1,0, +1 to different case
 uc: unicode order
 
 so, you define a sequence of comparisons where for instance
 
 U   - order u +/- 1
 \u - order of shape u +/- 1
 
 etc .. a bit cryptic I admit ... some combinations give the same result 
 depending on the vectors used. (Jano promissed to write up something.)
 
 numbers are sorted in a special way
 
 so, at some point we simplify characters and start looking at shapes and sort 
 based on shapes which of course leads to clashes so in a next step we look at 
 unicodes etc etc
 

OK, that makes sense. I'll play with it, but having a few choice pages on the 
wiki would be great!

 
 
 best would be to have a test file per language with in comments the expected 
 order; such tests should also provide foreign entries
 
 for instance, how would you mix german and greek in your books; we probably 
 need some specialized vectors then, which is possible as the sorting language 
 can be configured independent from the text language
 
OK, I'll write something for German and English, but the thing is that we need 
more input what users expect. For mixtures with foreign languages, there might 
not be generally accepted rules at all, so people will define something on an 
ad-hoc basis.

For Greek: I just looked at a dozen books here on my shelf. Most English books 
have a separate index for Greek terms; when they sort Greek terms with English 
words, they use transliteration. The problem with polytonic Greek is that so 
many different unicode characters need to have the same sort entry. If I ever 
see the necessity of setting this up, I'll be in touch off-list, but it's such 
an unusual thing that I think you shouldn't bother now.

All best

Thomas

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


[NTG-context] two buglets

2010-02-11 Thread Thomas A. Schmitz
Hi all,

working on a book project with index and bibliography, I discovered two small 
bugs (at least I think they are bugs):

1. index sorts uppercase letters after lowercase letters. Minimal example:

\starttext

\index{Aardvark}Aardvark

\index{azygous}azygous

\page

\setupregister[index][n=1]
\placeregister[index]

\stoptext

I would expect azygous to follow Aardvark, but it is sorted before.

2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command 
refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:

! LuaTeX error ...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: 
attempt to compare nil with number
stack traceback:
...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: in function 
...text/tex/texmf-context/tex/context/base/bibl-tra.lua:76
[C]: in function 'sort'
...text/tex/texmf-context/tex/context/base/bibl-tra.lua:84: in function 
'flush'
main ctx instance:1: in main chunk.
\typesetpubslist ...hacks.flush(\@@pbsorttype )}
  \doendoflist 
\dodoplacepublications ...sttrue \typesetpubslist 
  \inpublistfalse \endgroup ...
l.37 \placepublications[criterium=all]
  
minimal example (the typo \cite[clarke199] instead of \cite[clarke1999a] is 
there on purpose to demonstrate the problem):

\setuppublications[state=start,
   sorttype=bbl,
   refcommand=authornum,
   numbering=yes]

\setuppublicationlist[samplesize={VSdK90},totalnumber=2]

\startpublication[k=champion2004,t=book,
a={{Champion}},y=2004,
n=10,s=Cha04]
\author[]{Craige~B.}[C.~B.]{}{Champion}
\pubyear{2004}
\title{Cultural Politics in Polybius's {\em Histories}}
\city{Berkeley}
\pubname{Univ. of California Pr.}
\stoppublication

\startpublication[k=clarke1999a,t=book,
a={{Clarke}},y=1999b,
n=9,s=Cla99b]
\author[]{Katherine}[K.]{}{Clarke}
\pubyear{1999\maybeyear{b}}
\title{Between Geography and History: Hellenistic Constructions of the Roman
  World}
\city{Oxford}
\pubname{Oxford UP}
\stoppublication

\starttext

\cite[champion2004]

\cite[clarke199]

\page

\placepublications[criterium=all]

\stoptext

Could this error be handled more gracefully, i.e. intercepted?

All best

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-02-11 Thread Hans Hagen

On 11-2-2010 16:52, Thomas A. Schmitz wrote:

Hi all,

working on a book project with index and bibliography, I discovered two small 
bugs (at least I think they are bugs):

1. index sorts uppercase letters after lowercase letters. Minimal example:

\starttext

\index{Aardvark}Aardvark

\index{azygous}azygous

\page

\setupregister[index][n=1]
\placeregister[index]

\stoptext

I would expect azygous to follow Aardvark, but it is sorted before.



are you sure that that's the convention for english? it's easy to change 
it ...


\startluacode
sorters.mappings['en'] = {
[a] =  2, [b] =  4, [c] =  6, [d] =  8, [e] = 10,
[f] = 12, [g] = 14, [h] = 16, [i] = 18, [j] = 20,
[k] = 22, [l] = 24, [m] = 26, [n] = 28, [o] = 30,
[p] = 32, [q] = 34, [r] = 36, [s] = 38, [t] = 40,
[u] = 42, [v] = 44, [w] = 46, [x] = 48, [y] = 50,
[z] = 52,
[A] =  1, [B] =  3, [C] =  5, [D] =  7, [E] =  9,
[F] = 11, [G] = 13, [H] = 15, [I] = 17, [J] = 19,
[K] = 21, [L] = 23, [M] = 25, [N] = 27, [O] = 29,
[P] = 31, [Q] = 33, [R] = 35, [S] = 37, [T] = 39,
[U] = 41, [V] = 43, [W] = 45, [X] = 47, [Y] = 49,
[Z] = 51,
}
\stopluacode

\starttext
\index{Aardvark}Aardvark \par
\index{azygous}azygous
\placeregister[index][n=1]
\stoptext



-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-02-11 Thread Hans Hagen

On 11-2-2010 16:52, Thomas A. Schmitz wrote:


2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command 
refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:


so what do you expect? to drop that entry? or else, what default key to 
use?


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-02-11 Thread Thomas A. Schmitz

On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:

 are you sure that that's the convention for english? it's easy to change it 
 ...
 
 \startluacode
 sorters.mappings['en'] = {
[a] =  2, [b] =  4, [c] =  6, [d] =  8, [e] = 10,
[f] = 12, [g] = 14, [h] = 16, [i] = 18, [j] = 20,
[k] = 22, [l] = 24, [m] = 26, [n] = 28, [o] = 30,
[p] = 32, [q] = 34, [r] = 36, [s] = 38, [t] = 40,
[u] = 42, [v] = 44, [w] = 46, [x] = 48, [y] = 50,
[z] = 52,
[A] =  1, [B] =  3, [C] =  5, [D] =  7, [E] =  9,
[F] = 11, [G] = 13, [H] = 15, [I] = 17, [J] = 19,
[K] = 21, [L] = 23, [M] = 25, [N] = 27, [O] = 29,
[P] = 31, [Q] = 33, [R] = 35, [S] = 37, [T] = 39,
[U] = 41, [V] = 43, [W] = 45, [X] = 47, [Y] = 49,
[Z] = 51,
 }
 \stopluacode
 
 \starttext
\index{Aardvark}Aardvark \par
\index{azygous}azygous
\placeregister[index][n=1]
 \stoptext

No, I'm not sure at all. All I can say is that a quick check in my scholarly 
books didn't bring up a single example where uppercase and lowercase were 
treated differently. If I apply your code, I will have the same problem with 
Azygous - aardvark. How would I write the table so that lowercase and 
uppercase are not distinguished at all? I tried

\startluacode
sorters.mappings['en'] = {
   [a] =  1, [b] =  2, [c] =  3, [d] =  4, [e] = 5,
   [f] = 6, [g] = 7, [h] = 8, [i] = 9, [j] = 10,
   [k] = 11, [l] = 12, [m] = 13, [n] = 14, [o] = 15,
   [p] = 16, [q] = 17, [r] = 18, [s] = 19, [t] = 20,
   [u] = 21, [v] = 22, [w] = 23, [x] = 24, [y] = 25,
   [z] = 26,
}
\stopluacode

but that didn't work.

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-02-11 Thread David Rogers

* Hans Hagen pra...@wxs.nl [2010-02-11 18:17]:

are you sure that that's the convention for english? it's easy to 
change it ...


I've never seen an ordinary English index that was sorted by case.
English indexes should definitely default to case-insensitive.

(Has anyone here ever been asked for an index in English sorted by
case?)


--
David
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-02-11 Thread Hans Hagen

On 11-2-2010 18:35, Thomas A. Schmitz wrote:


On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:


are you sure that that's the convention for english? it's easy to change it ...

\startluacode
sorters.mappings['en'] = {
[a] =  2, [b] =  4, [c] =  6, [d] =  8, [e] = 10,
[f] = 12, [g] = 14, [h] = 16, [i] = 18, [j] = 20,
[k] = 22, [l] = 24, [m] = 26, [n] = 28, [o] = 30,
[p] = 32, [q] = 34, [r] = 36, [s] = 38, [t] = 40,
[u] = 42, [v] = 44, [w] = 46, [x] = 48, [y] = 50,
[z] = 52,
[A] =  1, [B] =  3, [C] =  5, [D] =  7, [E] =  9,
[F] = 11, [G] = 13, [H] = 15, [I] = 17, [J] = 19,
[K] = 21, [L] = 23, [M] = 25, [N] = 27, [O] = 29,
[P] = 31, [Q] = 33, [R] = 35, [S] = 37, [T] = 39,
[U] = 41, [V] = 43, [W] = 45, [X] = 47, [Y] = 49,
[Z] = 51,
}
\stopluacode

\starttext
\index{Aardvark}Aardvark \par
\index{azygous}azygous
\placeregister[index][n=1]
\stoptext


No, I'm not sure at all. All I can say is that a quick check in my scholarly books 
didn't bring up a single example where uppercase and lowercase were treated 
differently. If I apply your code, I will have the same problem with Azygous - 
 aardvark. How would I write the table so that lowercase and uppercase are not 
distinguished at all? I tried

\startluacode
sorters.mappings['en'] = {
[a] =  1, [b] =  2, [c] =  3, [d] =  4, [e] = 5,
[f] = 6, [g] = 7, [h] = 8, [i] = 9, [j] = 10,
[k] = 11, [l] = 12, [m] = 13, [n] = 14, [o] = 15,
[p] = 16, [q] = 17, [r] = 18, [s] = 19, [t] = 20,
[u] = 21, [v] = 22, [w] = 23, [x] = 24, [y] = 25,
[z] = 26,
}
\stopluacode

but that didn't work.


just give them the same code, so A=1, a=1

(we could make that an option: upper first, lower first, mixed)

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] two buglets

2010-02-11 Thread Thomas A. Schmitz

On Feb 11, 2010, at 8:29 PM, Hans Hagen wrote:

 just give them the same code, so A=1, a=1
 
 (we could make that an option: upper first, lower first, mixed)
 
 Hans

Thank you, Hans, that works nicely! It would be good to have this as an option. 
And I would vote for having the mixed setting as default. I wasn't even aware 
that there were indexes that sort according to case.

All best

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___