Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Jonathan Kew
On 6/5/15 14:14, Joseph Wright wrote: Based on the current files, we have a block to set \XeTeXcharclass, which only applies to XeTeX. The logic followed in that code is that characters in the file LineBreak.txt which have class ID (ideographs) not only set the \XeTeXcharclass class to 1 but

[XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
Hello all, As some people will have seen, the LaTeX team have recently integrated setting of codes (\catcode, \lccode, etc.) for the entire Unicode range into the kernel when XeTeX/LuaTeX are in use. This is not a functional change for end users but does mean that the team now have some control

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread Arthur Reutenauer
While working on these bugs, we also discussed how surrogate characters were handled in XeTeX. Surrogate characters are the 2048 code points that are used in UTF-16 to encode characters with code points above 65536: a pair of them makes up one Unicode character; however they're not meant to be

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread David Carlisle
On 6 May 2015 at 23:04, Arthur Reutenauer arthur.reutena...@normalesup.org wrote: While working on these bugs, we also discussed how surrogate characters were handled in XeTeX. Surrogate characters are the 2048 code points that are used in UTF-16 to encode characters with code points above

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread David Carlisle
The character itself, as bytes that is, is not wrong and users should be able to create these. But preferably through macros that ensure that they come correctly paired. placing two character tokens representing a surrogate pair should not though magically turn itself into a single character.

[XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Apostolos Syropoulos
The only mark that remains when making all capitals is the dieredis (dialytika). All other vanish. This is common knowledge for people who speak and write Greek. AS Στάλθηκε από το Ταχυδρομείο Yahoo στο Android  -- Subscriptions, Archive,

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread Ross Moore
Hi Arthur, On 07/05/2015, at 8:04, Arthur Reutenauer arthur.reutena...@normalesup.org wrote: While working on these bugs, we also discussed how surrogate characters were handled in XeTeX. Surrogate characters are the 2048 code points that are used in UTF-16 to encode characters with code

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
On 06/05/2015 21:06, David Carlisle wrote: On 6 May 2015 at 20:15, Philip Taylor p.tay...@rhul.ac.uk wrote: Apostolos Syropoulos wrote: It seems to me that most people have no idea what Unicode is and what is really involved. OK, so if we restrict the Universe of Discourse to the set of

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Julian Bradfield
On 2015-05-06, Apostolos Syropoulos asyropou...@yahoo.com wrote: I checked a bit the file and I have noticed that \L 1F10 1F18 1F10 % while xgreek.sty defines \global\lccode1F10=1F10 \global\uccode1F10=0395 You see the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI' is 'GREEK LETTER

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread David Carlisle
On 6 May 2015 at 20:15, Philip Taylor p.tay...@rhul.ac.uk wrote: Apostolos Syropoulos wrote: It seems to me that most people have no idea what Unicode is and what is really involved. OK, so if we restrict the Universe of Discourse to the set of native Hellenic speakers who know what

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
On 06/05/2015 15:09, Jonathan Kew wrote: On 6/5/15 14:14, Joseph Wright wrote: Based on the current files, we have a block to set \XeTeXcharclass, which only applies to XeTeX. The logic followed in that code is that characters in the file LineBreak.txt which have class ID (ideographs) not

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Philip Taylor
David Carlisle wrote: I don't think that's the right question. Even if everyone, including the Unicode technical committee, agreed some properties are incorrect for some characters, it isn't clear we should change them at this level. You are (inadvertently) conflating my question with

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
On 06/05/2015 16:04, Apostolos Syropoulos wrote: Hello, I checked a bit the file and I have noticed that \L 1F10 1F18 1F10 % while xgreek.sty defines \global\lccode1F10=1F10 \global\uccode1F10=0395 You see the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI' is 'GREEK

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Philip Taylor
Apostolos Syropoulos wrote: It seems to me that most people have no idea what Unicode is and what is really involved. OK, so if we restrict the Universe of Discourse to the set of native Hellenic speakers who know what Unicode is, know the importance of being able to use it to identify

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Philip Taylor
Apostolos Syropoulos wrote: I'd suggest that the basic (Xe|Lua)TeX formats should simply follow Unicode properties. In addition, I would suggest that somewhere it is explained why this is not correct. Otherwise, people would see strange things and might wonder why they see them. How

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Apostolos Syropoulos
How united is the Hellenic-speaking world about this, Apostolos ? Is it a universal truth, universally accepted, or are there some (even just a few) who maintain that Unicode is right and everyone else is wrong ? It seems to me that most people have no idea what Unicode is and what is

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Jonathan Kew
On 6/5/15 16:29, Philip Taylor wrote: Apostolos Syropoulos wrote: the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI' is 'GREEK LETTER EPSILON' and not 'GREEK LETTER EPSILON WITH PSILI. Some time ago I reported this to the Unicode people and they told me something like we cannot

Re: [XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

2015-05-06 Thread Ross Moore
Hi David, On 07/05/2015, at 9:26 AM, David Carlisle wrote: The character itself, as bytes that is, is not wrong and users should be able to create these. But preferably through macros that ensure that they come correctly paired. placing two character tokens representing a surrogate pair

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Philip Taylor
Apostolos Syropoulos wrote: the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI' is 'GREEK LETTER EPSILON' and not 'GREEK LETTER EPSILON WITH PSILI. Some time ago I reported this to the Unicode people and they told me something like we cannot change it now (I do not remember the

Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Apostolos Syropoulos
Hello, I checked a bit the file and I have noticed that \L 1F10 1F18 1F10 % while xgreek.sty defines \global\lccode1F10=1F10 \global\uccode1F10=0395 You see the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI' is 'GREEK LETTER EPSILON' and not 'GREEK LETTER EPSILON WITH PSILI. Some