Re: Caron / Hacek?

2003-03-07 Thread Pim Blokland
John Hudson schreef: By the way, although Unicode calls it a cedilla, the correct form to use with G is the disconnected, 'under comma' form. Ah yes, the cedillas; now these are ambiguous! What is the correct form for cedillas under N, K, L, R, S and T? What should these look like? The fonts

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Pim Blokland
John Cowan schreef: Digraphs and ligatures are both made by combining two glyphs. In a digraph, the glyphs remain separate but are placed close together. In a ligature, the glyphs are fused into a single glyph. Oh, in that case I must say I think the UnicodeData.txt file doesn't do a very

Re: Caron / Hacek?

2003-03-07 Thread John Cowan
Pim Blokland scripsit: Now I must admit, I haven't come across many texts which used Ts with cedillas. Not in printed form, that is; the only ones I have seen were in electronic form, where their appearance depends on the font used. T with cedilla should never have existed. When s with comma

RE: Caron / Hacek?

2003-03-07 Thread Kent Karlsson
By the way, although Unicode calls it a cedilla, the correct form to use with G is the disconnected, 'under comma' form. Ah yes, the cedillas; now these are ambiguous! What is the correct form for cedillas under N, K, L, R, S and T? What should these look like? Well, the easy (and

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread John Cowan
Pim Blokland scripsit: For instance, the Danish ae (U+00E6) is not designated a ligature, It was in Unicode 1.0; I think politics were involved in that one. In Latin use, ae is most certainly a ligature, and likewise in the languages (including English) that have borrowed words involving it. In

RE: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Kent Karlsson
The names do NOT always provide correct descriptions of the characters. This is especially true for digraph and ligature (and in the case of U+00E6 too), as well as (e.g.) SCRIPT CAPITAL P, which is neither script, nor capital (it's lowercase), though it is a p... In addition, there are

Re: FAQ entry

2003-03-07 Thread David Oftedal
Oh, in that case I must say I think the UnicodeData.txt file doesn't do a very good job. For instance, the Danish ae (U+00E6) is not designated a ligature, but the Dutch ij (U+0133) is, even though the a and e are clearly fused together, while the i and j aren't. Hm, this whole concept seems

RE: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Kent Karlsson
For instance, the Danish ae (U+00E6) is not designated a ligature, It was in Unicode 1.0; I think politics were involved in that one. In Latin use, ae is most certainly a ligature, and likewise in the languages (including English) that have borrowed words involving it. In Danish use,

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread John H. Jenkins
On Friday, March 7, 2003, at 04:26 AM, Pim Blokland wrote: Oh, in that case I must say I think the UnicodeData.txt file doesn't do a very good job. For instance, the Danish ae (U+00E6) is not designated a ligature, but the Dutch ij (U+0133) is, even though the a and e are clearly fused

Re: FAQ entry

2003-03-07 Thread Pim Blokland
David Oftedal schreef: Hm, this whole concept seems stupid if you ask me. That's beside the point. The issue of this discussion is not how stupid this all is, but how consistent is the description of the UnicodeData.txt file. So I DO care whether I should call something a digraph or a ligature.

Re: The display of *kholam* on PCs

2003-03-07 Thread Julian Gilbey
On Thu, Mar 06, 2003 at 02:25:19PM -0500, Dean Snyder wrote: Ben Yehuda is a modern Hebrew dictionary, and, as I noted in my original email, I have little experience in modern, Israeli Hebrew - maybe the orthography is different there, I just don't know. Which is why I was limiting my remarks

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Roozbeh Pournader
On Fri, 7 Mar 2003, John H. Jenkins wrote: since different people speaking different languages often have different perceptions of what a symbol is. Reminds me of ISIRI 3342 that officially considered symbol and character the same thing and used one word (namaad, Noon, Meem, Alef, Dal) for

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Pim Blokland
Kent Karlsson schreef: Typographically, it's a ligature either way. You mean that both ae and ij should be called ligatures, although one is fused and the other isn't? OK, I can live with that. I'd rather the ij were called a digraph, though. The ij is considered by some to be one letter in

Re: FAQ entry (was: Looking for information on the UnicodeDatafile)

2003-03-07 Thread Michael Everson
At 15:36 +0100 2003-03-07, Pim Blokland wrote: Kent Karlsson schreef: Typographically, it's a ligature either way. You mean that both ae and ij should be called ligatures, although one is fused and the other isn't? OK, I can live with that. I'd rather the ij were called a digraph, though. These

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread John Cowan
Kent Karlsson scripsit: E.g., it is quite legitimate to render, e.g. LIGATURE FI as an f followed by an i, no ligation, whereas that is not allowed for the ae ligature/letter, nor for the oe ligature. How do you know that? Either Caesar or Cæsar is good Latin. -- After fixing the Y2K bug

RE: FAQ entry

2003-03-07 Thread Kent Karlsson
E.g., it is quite legitimate to render, e.g. LIGATURE FI as an f followed by an i, no ligation, whereas that is not allowed for the ae ligature/letter, nor for the oe ligature. How do you know that? Either Caesar or Cæsar is good Latin. That's the other way around. Ligating ae into æ

RE: FAQ entry

2003-03-07 Thread Kent Karlsson
Typographically, it's a ligature either way. You mean that both ae and ij should be called ligatures, although one is fused and the other isn't? No. What I'm trying to say is that the names do not really matter. While there is a strive to give good names to characters, they sometimes are

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread John Cowan
Pim Blokland scripsit: The ij is considered by some to be one letter in Dutch, and when written down, an i and a j together look very much like a written y with diaeresis. (See fonts like Script MT.) So I can understand foreigners getting confused and encoding it that way (as a y with

Re: FAQ entry

2003-03-07 Thread John Cowan
Kent Karlsson scripsit: Ligating ae into æ works for Latin and sometimes English (could be done via a smart font). Always for English, I think: if someone finds a counterexample, let them use a + ZWNJ + e. Note that e.g. an fj ligature is just as legitimate and useful as an fi ligature

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Doug Ewell
Michael Everson everson at evertype dot com wrote: You mean that both ae and ij should be called ligatures, although one is fused and the other isn't? OK, I can live with that. I'd rather the ij were called a digraph, though. These terms are not normative. Get used to it. The names

Re: FAQ entry

2003-03-07 Thread David Oftedal
What an interesting character ij, or y is. It really shows how languages evolve over time. As for the æ: How do you know that? Either Caesar or Cæsar is good Latin. We're not necessarily talking about Latin here. In Norwegian and Danish, æ is not a ligature, but a separate sound almost

RE: FAQ entry

2003-03-07 Thread Kent Karlsson
Actually, it is of orthographic significance: it is not uncommon for good fonts to have an fj ligature. That typography, not orthography. But I would appreciate if more fonts had an fj ligature, and (e.g.) a gj ligature too (in some fonts gj otherwise have overlapping glyphs). /kent

Re: FAQ entry (was: Looking for information on the UnicodeDatafile)

2003-03-07 Thread Michael Everson
At 08:23 -0800 2003-03-07, Doug Ewell wrote: The names themselves are normative, of course. What is not normative is the distinction between the terms LETTER, LIGATURE, and DIGRAPH used in the names. Just wanted to clarify that for Pim. I didn't say the names are not normative. I said the terms

Re: Caron / Hacek?

2003-03-07 Thread John Hudson
At 01:49 AM 3/7/2003, Pim Blokland wrote: Ah yes, the cedillas; now these are ambiguous! What is the correct form for cedillas under N, K, L, R, S and T? What should these look like? The fonts I've seen disagree on all of them: some have commas, others have real cedillas. Since Unicode 3.0 came

Re: Caron / Hacek?

2003-03-07 Thread Pim Blokland
John Hudson schreef: The most problematical part of this is that 8-bit codepages supporting Romanian use the old S and T with *cedilla* codepoints, not the new S and T with comma codepoints. Apple updated their Romanian codepage shortly after those new characters appeared, five years ago. Not

Re: FAQ entry

2003-03-07 Thread Noah Levitt
On Fri, Mar 07, 2003 at 17:27:08 +0100, David Oftedal wrote: We're not necessarily talking about Latin here. In Norwegian and Danish, is not a ligature, but a separate sound almost unpronounceable by English speakers. I believe is also a character in the IPA. Noah

Unicode VB

2003-03-07 Thread Magda Danish \(Unicode\)
-Original Message- Date/Time:Fri Mar 7 12:44:47 EST 2003 Contact: [EMAIL PROTECTED] Report Type: Other Question, Problem, or Feedback I was wondering when writing code for a program in Visual Basic.NET. Just a very very simple code that converts charactersto

Re: length of text by different languages

2003-03-07 Thread Yung-Fong Tang
Ram Viswanadha wrote: There is also some information at http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results Not sure if this is what you are looking for. thanks. not really. I am not look into the

yaphalaa=(?)virama+ya

2003-03-07 Thread Anirban Mitra
Mijan scripsit: Let's consider the ra+virama+ya case. In the mostpart the ra+virama+ya is displayed as ya+reph. This obviously seems to be an instance of ambiguous interpretation because ra+virama+ya could also represents ra+ja-phalaa. ya+reph and ra+ja-phalaa are used in different words and