We had a discussion in the SII and the consensus was that we should object
to:
- any change or addition related to Hebrew that would invalidate existing
Unicode data or require its modification or re-examination
- any change or addition to Unicode that would make the use of Hebrew more
I try to convert a LaTeX document into Word through UTF-8 coded HTML.
When I import a small test
http://www.mimuw.edu.pl/~jsbien/poufne/utf8-pjk.html
http://www.mimuw.edu.pl/~jsbien/poufne/utf8-pjk.css
into Word, I see it correctly. To be precise, sufficiently correctly
(a
Ken Whistler wrote on 07/28/2003 08:34:50 PM:
I doubt it. I think it is much more likely that the stability of
normalization per se will hold. And when people finally come to
understand
that Unicode normalization forms don't meet all of their
string equivalencing needs, the pressure will
Ken Whistler wrote on 07/25/2003 07:39:59 PM:
Of course, zwnbs is not a base character...
There is no need for an invisible base character here.
Moreover, a space of any type would be a particularly bad thing -- it's not
two words.
- Peter
Ken Whistler wrote:
...
which I think is as faulty as that of people who might claim that,
for example, storing ä for Swedish as a, combining diaeresis
would be incorrect from a user's point of view.
I have no problem at all with ä (precomposed) being equivalent to a,
combining diaeresis. I
On 28/07/2003 18:28, John Cowan wrote:
Peter Kirk scripsit:
Napoleon managed to impose and are still uniform all the way from Calais
to Vladivostok (because even the Russians accepted his system for a
while), even traffic rules (drive on the right, give way to the right),
but are different
Well, in either case, the original point falls to bits. Neither of the two
countries match the original descriptor of 'the at-the-time most progressive
nation on Earth'.
Nor does any other. It's simply much too simplistic a statement.
K
- Original Message -
From: John Cowan [EMAIL
Well, that was precisely the question. Are we talking about a mere
preference of visual effect or an actual difference in (original) text--that
is, an intended semantic differentiation?
K
- Original Message -
From: Jony Rosenne [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, July
On 28/07/2003 19:05, Kenneth Whistler wrote:
...
This is, of course, precisely the desired result -- the CGJ is
ignored for weighting, but its presence prevents the reordering
of the vowels into the undesired sequence by normalization.
And the resultant weighted key weights the vowels in the
Both the html files open in Word2002 without problem, Polish Japanese
characters included.
Raymond Mercier
- Original Message -
From: Janusz S. Bie [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, July 29, 2003 9:56 AM
Subject: UTF-8 and HTML import into MS Word 2000
I try to
Thank you, Jony, for taking this discussion to the SII and for bringing the
response back to this group.
Based on the SII response, it sounds like either doing nothing (within
Unicode proper) or developing Ken's CGJ proposal are the leading contenders
at this point.
Also [inre CGJ and ZWNBSP]:
[The following was posted to the Biblical Hebrew list and I am forwarding it
as potentially helpful information regarding this issue, which was raised
here. Not sure whether I should post the name/source?]
I have not at hand now facsimiles of the L and Aleppo manuscripts, but I am
nearly sure
On 28/07/2003 21:18, Jony Rosenne wrote:
The most reasonable way to achieve visible effects, as opposed to difference
in text, is by markup.
Jony
But, Jony, this IS a difference in the text. It is a different character
sequence with a very different pronunciation and a thousand year history
Peter Kirk said:
If we are to use markup to distinguish between characters which are
semantically and phonetically as well as graphically distinct, we may as
well reduce Unicode to one character and make all distinctions with
markup. ;-)
Then it would truly be UNI-code!
K
Karljürgen Feuerherm scripsit:
Well, in either case, the original point falls to bits. Neither of the two
countries match the original descriptor of 'the at-the-time most progressive
nation on Earth'.
In terms of reform of this kind, the U.S. certainly does match, thanks to
Thomas Jefferson,
I'm willing to concede that the US may have been the most progressive nation
on earth with respect to the *specifically restricted context* of
rationalizing the currency system in use in that place at that time :)
The original statement sounded rather more all-encompassing.
K
- Original
Peter Kirk scripsit:
This reminds me of the polytonic Greek issue. If I understand correctly,
the Greek government decided to do away with the distinction between
accents because this was easier to implement with 1960's computers.
I find that hard to believe, to say the least. Surely
On 28/07/2003 23:37, Jony Rosenne wrote:
We had a discussion in the SII and the consensus was that we should object
to:
- any change or addition related to Hebrew that would invalidate existing
Unicode data or require its modification or re-examination
I can agree that any change should not
On 29/07/2003 06:11, Karljürgen Feuerherm wrote:
Well, that was precisely the question. Are we talking about a mere
preference of visual effect or an actual difference in (original) text--that
is, an intended semantic differentiation?
K
I don't agree that ancient history should necessarily
Peter Kirk said:
I don't agree that ancient history should necessarily determine this.
It's a bit like the distinction between U and V in English, in fact
closely analogous phonetically. As originally used in English they were
one character. But I don't think that would justify an argument
At 07:31 -0700 2003-07-29, Peter Kirk wrote:
I don't think you French Canadians would be very happy if accented
upper case vowels were removed from Unicode because they are not
used in France.
This isn't true. They *are* used in France.
--
Michael Everson * * Everson Typography * *
Yes, Michael is right event if a lot of people doesn't use accented
upper case (they don't know how to do it, or the fact that they can do
it), but this IS the rule in French typography.
Just have a look at le Monde http://www.lemonde.fr
Bertrand
Le mardi, 29 jul 2003, à 16:58 Europe/Paris,
Peter Kirk peter dot r dot kirk at ntlworld dot com wrote:L
If we are to use markup to distinguish between characters which are
semantically and phonetically as well as graphically distinct, we may
as well reduce Unicode to one character and make all distinctions with
markup. ;-)
That would
I believe they're optional though, at least, aren't they?
K
- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, July 29, 2003 10:58 AM
Subject: Re: Back to Hebrew, was OT:darn'd fools
At 07:31 -0700 2003-07-29, Peter Kirk wrote:
I don't
Peter Kirk posted:
I don't think you French Canadians would be very happy if accented upper
case vowels were removed from Unicode because they are not used in
France. (I must find some way to divide you from the real French
But accented upper case vowels are used in France.
See
On 29/07/2003 07:58, Michael Everson wrote:
At 07:31 -0700 2003-07-29, Peter Kirk wrote:
I don't think you French Canadians would be very happy if accented
upper case vowels were removed from Unicode because they are not used
in France.
This isn't true. They *are* used in France.
OK, but
- Original Message -
From: Karljürgen Feuerherm [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: 29 juil. 2003 11:47
Subject: French accents on uppercase, was Back to Hebrew, was OT:darn'd
fools
I believe they're optional though, at least, aren't they?
Depends on the source. But good
At 11:47 -0400 2003-07-29, Karljürgen Feuerherm wrote:
I believe they're optional though, at least, aren't they?
Not in good typography. You must unlearn what you have learned
--
Michael Everson * * Everson Typography * * http://www.evertype.com
At 11:52 -0400 2003-07-29, Jim Allan wrote:
One the other hand, dropping diacritics from
names or text written in all uppercase is
considered acceptable in Quebec French (and I
suspect also in France) dating from old
addressograph technology and billing typewriter
technology where capital
At 08:47 -0700 2003-07-29, Peter Kirk wrote:
Another example might be German ß (U+00DF). Many
people don't use it, indeed I think it has been
officially abolished, but many others do use it.
Peter, there isn't a shred of truth in what you are saying.
--
Michael Everson * * Everson Typography *
UTR 20, table 4.1 writes that ZWNJ and ZWJ are needed for « a.o. Persian ».
What does the abbreviation « a.o. » mean ? Arabic O..?
Is this current Farsi or some historical Persian ?
Thank you
P.A.
Adobe Acrobat!
-Original Message-
From: Karljürgen Feuerherm [mailto:[EMAIL PROTECTED]
Sent: Sunday, 13 July 2003 16:09
To: [EMAIL PROTECTED]
Subject: Re: No UTF-8 in Eudora
Adobe FrameMaker. It desperately needs it.
K
- Original Message -
From: Don Osborn [EMAIL PROTECTED]
Michael Everson posted:
Then you have the old problem: what does « LE
PRESIDENT ASSASSINE » mean if such a practice is
employed?
Yes.
The context where all capital French without diacritics occurs in Canada
is generally in mailing lists where name and address data and other data
is all
On Tue, 2003-07-29 at 21:06, Patrick Andries wrote:
UTR 20, table 4.1 writes that ZWNJ and ZWJ are needed for a.o. Persian .
What does the abbreviation a.o. mean ? Arabic O..?
I have no clue about that, but ...
Is this current Farsi or some historical Persian ?
ZWNJ and ZWJ are required
Peter Kirk posted:
Another example might be German (U+00DF). Many people don't use it,
indeed I think it has been officially abolished, but many others do use
it. Suppose that it wasn't already in Unicode, and someone suggested it
shouldn't be added but should be encoded as ss with markup. I
Patrick Andries scripsit:
UTR 20, table 4.1 writes that ZWNJ and ZWJ are needed for « a.o. Persian ».
What does the abbreviation « a.o. » mean ? Arabic O..?
It appears to mean among others, but is not as far as I know a commonly
understood abbreviation.
--
How they ever reached any
Don't worry. I'm not looking for more work.
I just think it is worth going into this eyes open, and the recent BH
discussions have made me aware of how little I actually understood what I
thought I did.
A gramme of caution now is worth a tonne of cure later (in 'most
progressive nation of
At 10:36 -0700 2003-07-29, Peter Kirk wrote:
The only shred of untruth is that what I said I think is true is in
fact an exaggeration, the abolition is only partial.
Hence it was not officially abolished.
--
Michael Everson * * Everson Typography * * http://www.evertype.com
On 29/07/2003 09:21, Michael Everson wrote:
At 08:47 -0700 2003-07-29, Peter Kirk wrote:
Another example might be German ß (U+00DF). Many people don't use it,
indeed I think it has been officially abolished, but many others do
use it.
Peter, there isn't a shred of truth in what you are
On 29/07/2003 10:20, Roozbeh Pournader wrote:
On Tue, 2003-07-29 at 21:06, Patrick Andries wrote:
UTR 20, table 4.1 writes that ZWNJ and ZWJ are needed for a.o. Persian .
What does the abbreviation a.o. mean ? Arabic O..?
I have no clue about that, but ...
Is this current Farsi or
Jim Allan said:
Yet I've talked to French speakers at various times some years back who
had never noticed the difference until I pointed it out to them.
I find that amazing (note that I am not questioning the assertion). When I
learned to (hand)write, I was specifically instructed to take care
I think it is reasonably common. I was about to post the same remark you
did.
K
- Original Message -
From: John Cowan [EMAIL PROTECTED]
To: Patrick Andries [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, July 29, 2003 1:11 PM
Subject: Re: Meaning of a.o. Persian
Patrick Andries
Peter Kirk wrote on 07/29/2003 09:22:35 AM:
Or is markup
being suggested as a solution of the Yerushala(y)im issue? If so I fail
to see how it addresses the problem, as markup does not inhibit
normalisation.
The markup-based solution would have to be something like
yerushalaiai/aim
which
Sorry, posted to the wrong list, I think.
K
On 29/07/2003 10:39, Michael Everson wrote:
At 10:36 -0700 2003-07-29, Peter Kirk wrote:
The only shred of untruth is that what I said I think is true is in
fact an exaggeration, the abolition is only partial.
Hence it was not officially abolished.
OK, it was officially abolished only from
At 06:11 AM 7/29/2003, Karljürgen Feuerherm wrote:
Well, that was precisely the question. Are we talking about a mere
preference of visual effect or an actual difference in (original) text--that
is, an intended semantic differentiation?
A good question, and one for which I would like to know the
At 06:27 AM 7/29/2003, Ted Hopp wrote:
Based on the SII response, it sounds like either doing nothing (within
Unicode proper) or developing Ken's CGJ proposal are the leading contenders
at this point.
As stated previously, I'm reasonably happy with CGJ as a re-ordering
inhibitor *if* the
At 06:35 AM 7/29/2003, Peter Kirk wrote:
This reminds me of the polytonic Greek issue. If I understand correctly,
the Greek government decided to do away with the distinction between
accents because this was easier to implement with 1960's computers.
1982. The reasons were manifold, and
Okay -- there are two Hebrew vowels that are not encoded in Unicode. Their
(transliterated) Hebrew names are (caps indicate syllable accent): khoLAM
maLE and shuRUQ. The kholam male LOOKS like a vav with holam [05D5.05B9]
or the alphabetic presentation form FB4B (HEBREW LETTER VAV WITH HOLAM) and
On 29/07/2003 10:52, [EMAIL PROTECTED] wrote:
A variation (assuming that canonical ordering does not occur around markup
tags), might be something like
yerushalaCanonicalOrderingBlock
- Peter
If inserting an otherwise dummy piece of markup in the middle of a
canonical combining sequence
At 11:37 PM 7/28/2003, Jony Rosenne wrote:
Consequently, it was suggested that the several issues with Biblical Hebrew
recently mentioned, and several more which were not, should be solved by
means of markup, outside the scope of Unicode. This is how they have been
addressed in many of the
At 12:34 PM 7/25/2003, Kenneth Whistler wrote:
b. a minor political problem (that certain communities of Biblical
scholars are badmouthing Unicode because it can't fix its
obvious mistakes)
Wasn't it Michael Everson who made the comment about fixing obvious
mistakes? I'm not aware of
On 29/07/2003 10:46, John Hudson wrote:
At 06:11 AM 7/29/2003, Karljürgen Feuerherm wrote:
Well, that was precisely the question. Are we talking about a mere
preference of visual effect or an actual difference in (original)
text--that
is, an intended semantic differentiation?
A good question,
On 28/07/2003 23:37, Jony Rosenne wrote:
We had a discussion in the SII and the consensus was that we should object
to:
- any change or addition related to Hebrew that would invalidate existing
Unicode data or require its modification or re-examination
- any change or addition to Unicode that
At 22:21 +0200 2003-07-29, Jony Rosenne wrote:
With Hebrew, it is not accepted that it is a different Vav - letters
used as matres lectionis are not distinct from the same letters used
otherwise. Neither is it accepted that this is a different Holam.
The only thing established is that this
At 11:52 -0400 2003-07-29, Jim Allan wrote:
One the other hand, dropping diacritics from names or text written in
all uppercase is considered acceptable in Quebec French (and I
suspect also in France) dating from old addressograph technology and
billing typewriter technology where capital
Ken,
I am trying to get a grasp on the problem. Thanks for your explanations. If
you continue typing slowly enough, perhaps it will eventually get through.
And the fact that you and others arguing for the
canonical ordering change don't seem to recognize the distinction
is part of the reason why
Meteg to the right does not actually need an extra character, because if
CGJ is used to override canonical equivalence and reordering of vowel
sequences, the mechanism is already in place to use it in exactly the
same way for sequences of vowels and meteg.
Otherwise we
would write Karljfrontedu/frontedrgen or the like.
Actually, that would have been preferable to the way some of my official id
actually appears :(
K
- Forwarded by Joan Wardell/IntlAdmin/WCT on 07/29/2003 03:08 PM -
John Hudson
John Hudson wrote on 07/29/2003 12:36:01 PM:
Perhaps you would like to expand on this? What kind of markup? How would
it
interract with fonts and rendering engines?
It seems to me it would not, unless application software were explicitly
written to support the markup conventions and use some
Jony Rosenne wrote on 07/29/2003 03:21:08 PM:
The only thing established is that this artifact has been used in
several manuscripts, one of many similar artifacts, to aid the
understanding of the text. And the correct vehicle to convey such
artifacts is markup.
You say this as if it's
At 15:41 -0500 2003-07-29, [EMAIL PROTECTED] wrote:
Jony Rosenne wrote on 07/29/2003 03:21:08 PM:
The only thing established is that this artifact has been used in
several manuscripts, one of many similar artifacts, to aid the
understanding of the text. And the correct vehicle to convey such
Ken quoted me out of context, but perhaps I was unclear. At one point, I
said that I didn't think a medial meteg character was necessary for
rendering, because the ligation can be handled with the left meteg.
Earlier, we were discussing various options for solving the re-ordering
problem and
Fine, so we need a separate Unicode for each usage of gh in English.
Jony
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Hopp
Sent: Tuesday, July 29, 2003 8:20 PM
To: [EMAIL PROTECTED]
Subject: SPAM: Re: Back to Hebrew -holem-waw vs
Bertand Laidain posted:
No it's not ! What is considered acceptable is dropping the diacritics
when the capital letter is an INITIAL letter, (It's a debate also), but
for sure when you write in ALL capitals it's definitely not !
Fair enough for newspapers headlines and all-caps headers in text,
On 29/07/2003 12:23, John Hudson wrote:
In this case, there are two encoding preferences with related display
preferences. One preference preserves and displays a distinction, and
one preference removes and hides a distinction. I prefer the former,
and various contributors have explained why
On 29/07/2003 12:38, Michael Everson wrote:
At 22:21 +0200 2003-07-29, Jony Rosenne wrote:
With Hebrew, it is not accepted that it is a different Vav - letters
used as matres lectionis are not distinct from the same letters used
otherwise. Neither is it accepted that this is a different Holam.
At 03:33 PM 7/29/2003, Peter Kirk wrote:
Fonts don't get that clever.
Probably not. Do they have any option to set a flag like the last
character was a vowel which can then be tested when the next character is
painted? If so there is a chance of detecting this efficiently without
having to be
On 29/07/2003 12:48, [EMAIL PROTECTED] wrote:
Meteg to the right does not actually need an extra character, because if
CGJ is used to override canonical equivalence and reordering of vowel
sequences, the mechanism is already in place to use it in exactly the
same way for sequences of vowels and
On 29/07/2003 13:03, Karljrgen Feuerherm wrote:
Otherwise we
would write Karljfrontedu/frontedrgen or the like.
Actually, that would have been preferable to the way some of my official id
actually appears :(
K
And probably to what some software does with it. One of your recent
On 29/07/2003 13:10, [EMAIL PROTECTED] wrote:
[quoting John Hudson I think - PK]
shin hataf dagesh shindot new right meteg
Surely, if we allocate a sensible combining class to our new character
based on its logical position, as we are presumably free to do, this
would be normalised as:
On 29/07/2003 15:44, John Hudson wrote:
At 03:33 PM 7/29/2003, Peter Kirk wrote:
Fonts don't get that clever.
Probably not. Do they have any option to set a flag like the last
character was a vowel which can then be tested when the next
character is painted? If so there is a chance of
On Tuesday, July 29, 2003 7:27 PM, Jony Rosenne wrote:
Fine, so we need a separate Unicode for each usage of gh in English.
Absolutely. We already have 007C (VERTICAL LINE), 01C0 (LATIN LETTER DENTAL
CLICK), 2223 (DIVIDES), and 2758 (LIGHT VERTICAL BAR).
We also have 00C5 (LATIN CAPITAL LETTER
At 03:16 PM 7/29/2003, Kenneth Whistler wrote:
How about:
shin regular meteg CGJ hataf dagesh shindot
The CGJ prevents the reordering of the meteg around the hataf and
dagesh, and the sequence meteg, CGJ, hataf gives the font
a separate sequence to ligate, distinguishing it from
hataf,
At 04:11 PM 7/29/2003, Peter Kirk wrote:
Either I have not made myself clear or my understanding of the rendering
process is even less than I thought. Perhaps I should have said glyph
rather than character. But the real point is that I am suggesting some
kind of flag which could be preserved
Peter Kirk wrote:
On 29/07/2003 13:03, Karljrgen Feuerherm wrote:
Otherwise we
would write Karljfrontedu/frontedrgen or the like.
Actually, that would have been preferable to the way some of my official
id
actually appears :(
And probably to what some software does with it. One
On 29/07/2003 16:28, John Hudson wrote:
At 04:11 PM 7/29/2003, Peter Kirk wrote:
Either I have not made myself clear or my understanding of the
rendering process is even less than I thought. Perhaps I should have
said glyph rather than character. But the real point is that I am
suggesting
Could be done with Graphite also I think.
K
- Original Message -
From: John Hudson [EMAIL PROTECTED]
To: Peter Kirk [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tuesday, July 29, 2003 6:44 PM
Subject: Re: Back to Hebrew, was OT:darn'd fools
At 03:33 PM 7/29/2003, Peter Kirk wrote:
That really depends on the rendering engine. AAT can handle it without
too much difficulty (or, at least, the mathematical equivalent).
On Tuesday, July 29, 2003, at 5:11 PM, Peter Kirk wrote:
Either I have not made myself clear or my understanding of the
rendering process is even less than I
On 29/07/2003 16:37, Karljrgen Feuerherm wrote:
Peter Kirk wrote:
And probably to what some software does with it. One of your recent
messages to this list came with the following line in its source:
From: =?8859_1?B?S2FybGr8cmdlbg==?= Feuerherm [EMAIL PROTECTED]
and Mozilla renders that
In section 3.4, UTR No. 20 speaks of « cursively-connected scripts».
(http://www.unicode.org/reports/tr20/#Deprecated)
Unicode 4.0's glossary defines cursive as « writing where the letters of a
word are connected » (I have the same definition in a large French book
about the history of
On 29/07/2003 11:20, Ted Hopp wrote:
Okay -- there are two Hebrew vowels that are not encoded in Unicode. Their
(transliterated) Hebrew names are (caps indicate syllable accent): khoLAM
maLE and shuRUQ. The kholam male LOOKS like a vav with holam [05D5.05B9]
or the alphabetic presentation form
83 matches
Mail list logo