Hi All,
I have a doubt regarding existence of certain Japanese characters in
Unicode.
The characters I'm referring are those like Double byte space which
one can get from old NEC machines or can be entered thru Japanese
keyboard only.
Can anyone please throw some light on this ?
Regards,
Ken Whistler wrote on 06/25/2003 05:29:59 PM:
The point is that hiriq before patah is *not*
canonically equivalent to patah before hiriq,
This is true.
except in the erroneous
assumption of the Unicode Standard: the order of vowels makes words
sound
different and mean
Michael Everson wrote on 06/25/2003 04:36:20 PM:
[ re Biblical Hebrew ]
Write it up with glyphs and minimal pairs and people will see the
problem, if any. Or propose some solution. (That isn't add duplicate
characters.)
The only solution that UTC is willing to consider I have already
Jony Rosenne wrote on 06/26/2003 12:16:22 AM:
When, in the Bible, one sees two vowels on a given consonant, it isn't
so.
That's silly. When one sees two vowels on a given consonant in the Bible,
it *is* so: the two vowels are written there. It may not correspond to
actual phonology, ie what
Ken Whistler wrote on 06/25/2003 06:57:56 PM:
People could consider, for example, representation
of the required sequence:
lamed, qamets, hiriq, final mem
as:
lamed, qamets, ZWJ, hiriq, final mem
So, we want to introduce yet *another* distinct semantic for ZWJ? We've
got one for
John Hudson wrote on 06/25/2003 06:47:44 PM:
This is not. The Unicode Standard makes no assumptions or claims
about what the phonological or meaning equivalence of hiriq, patah
or patah, hiriq is for Biblical Hebrew.
But it does make assumptions about the canonical equivalence of the mark
Karljürgen Feuerherm wrote on 06/25/2003 08:31:41 PM:
I was going to suggest something very similar, a ZW-pseudo-consonant of
some
kind, which would force each vowel to be associated with one consonant.
An invisible *consonant* doesn't make sense because the problem involves
more than just
Twenty-fourth Internationalization and Unicode Conference (IUC24)
Unicode, Internationalization, the Web: Powering Global Business
http://www.unicode.org/iuc/iuc24
Sourav,
Hi, your question is ambiguous to me.
You seem to be referring to the fullwidth space and other wide or
fullwidth characters.
For the fullwidth space look at u+3000 ideographic space.
Unicode has other fullwidth characters encoded. Look at the code charts...
hth
tex
souravm wrote:
Hi,
For those of you that couldn't attend and were interested in the exhibitor's
panel at the last Unicode conference, a brief summary is now online at:
http://www.unicode.org/iuc/iuc23/showcase-report.html
If you have any comments or feedback on the page, I would be glad to receive
it
On Wed, 25 Jun 2003 21:58:28 -0700, Elisha Berns wrote:
Some weeks back there were a number of postings about software for
viewing Unicode Ranges in TrueType fonts and I had a few questions about
that. Most viewers listed seemed to only check the Unicode Range bits of
the fonts which can be
It may look, silly, but it is correct. What you see are letters according to
the writing tradition, which does not include a Yod, and vowels according to
the reading tradition which does. There are in the Bible other, more extreme
cases.
I don't think we need any new characters, ZERO WIDTH SPACE
Michael Everson wrote as follows.
At 08:44 -0700 2003-06-25, Doug Ewell wrote:
If it's true that either the UTC or WG2 has formally approved the
character, for a future version of Unicode or a future amendment to 10646,
then I don't see any reason why font makers can't PRODUCE a font with a
Peter Constable wrote as follows.
the name is simply a unique identifier within the std.
Well, the Standard is the authority for what is the meaning of the symbol
when found in a file of plain text. So if the symbol is in a plain text
file before or after the name of a person then the
Tom Gewecke wrote as follows.
My personal idea of an Orwellian nightmare would to have a committee of
vigilant freedom protectors evaluating the political and social
implications of encoding symbols and passing judgement on whether
particular characters should be encoded and what their names
On Thursday, June 26, 2003 11:50 AM, Andrew C. West [EMAIL PROTECTED] wrote:
On Wed, 25 Jun 2003 21:58:28 -0700, Elisha Berns wrote:
Some weeks back there were a number of postings about software for
viewing Unicode Ranges in TrueType fonts and I had a few questions
about that. Most
On Thursday, June 26, 2003 2:26 PM, Philippe Verdy [EMAIL PROTECTED] wrote:
I forgot also the probably better function from the Uniscribe library, which processes
strings through a language-dependant shaping algorithm, and can determine appropriate
glyph substitution, or use custom composite
On Thu, 26 Jun 2003 14:26:13 +0200, Philippe Verdy wrote:
Isn't there a work-around with the following function (quote from Microsoft
MSDN):
(with the caveat that you first need to allocate and fill a Unicode string for
the
codepoints you want to test, and this can be lengthy if one wants to
William Overington WOverington at ngo dot globalnet dot co dot uk
wrote:
Well, certainly authority would be needed, yet I am suggesting that
where a few characters added into an established block are accepted,
which is what is claimed for these characters, there should be a
faster route than
At 12:43 AM 6/26/2003, [EMAIL PROTECTED] wrote:
The problem of combinations of vowels with meteg could be
amenable to a similar approach. OR, one could propose just
one additional meteq/silluq character, to make it possible
to distinguish (in plain text) instances of left-side and
right-side
At 04:26 AM 6/26/2003, Jony Rosenne wrote:
I don't think we need any new characters, ZERO WIDTH SPACE would do and it
requires no new semantics.
ZERO WIDTH SPACE would screw up search and sort algorithms, I think,
because it is not a control character per se and may not be ignored as desired.
Jony Rosenne wrote on 06/26/2003 06:26:02 AM:
It may look, silly, but it is correct. What you see are letters
according to
the writing tradition, which does not include a Yod, and vowels
according to
the reading tradition which does.
I understand that. My point was, you were talking about
William Overington wrote on 06/26/2003 06:24:44 AM:
the name is simply a unique identifier within the std.
Well, the Standard is the authority for what is the meaning of the
symbol
when found in a file of plain text. So if the symbol is in a plain text
file before or after the name
William Overington wrote on 06/26/2003 07:03:12 AM:
yet I am suggesting that where a
few characters added into an established block are accepted, which is
what
is claimed for these characters, there should be a faster route than
having
to wait for bulk release in Unicode 4.1.
Once both UTC
Wow... How on earth did the subject line Major Defect in Combining
Classes of Tibetan Vowels turn into a discussion of Biblical Hebrew? At
least, people, if you're going to transmogrify the discussion, please use a
subject line such as Biblical Hebrew which someone already was wise
enough
At 13:03 +0100 2003-06-26, William Overington wrote:
Well, certainly authority would be needed, yet I am suggesting that where a
few characters added into an established block are accepted, which is what
is claimed for these characters, there should be a faster route than having
to wait for bulk
At 12:09 -0500 2003-06-26, [EMAIL PROTECTED] wrote:
The only meaning that the Standard implies is that the character encoded
at codepoint x represents they symbol of a wheelchair. It does not imply
*anything* about how its usage in juxtaposition with the name of a person
should be interpreted.
On Thursday, June 26, 2003 4:13 PM, Andrew C. West [EMAIL PROTECTED] wrote:
On Thu, 26 Jun 2003 14:26:13 +0200, Philippe Verdy wrote:
Isn't there a work-around with the following function (quote from
Microsoft MSDN):
(with the caveat that you first need to allocate and fill a Unicode
William Overington scripsit:
This issue has arisen because of my concern that a particular symbol has
been labelled as HANDICAPPED SIGN. I hope that the name will be changed to
WHEELCHAIR SYMBOL.
If you are going to discriminate (invidiously) using a computerized
database, using H for
WHEELCHAIR SYMBOL at least has the virtue of being descriptive of the symbol
rather than of the use and thus potentially more neutral all the way around.
K
- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, June 26, 2003 2:13 PM
Subject: Re:
Elisha Berns scripsit:
It's odd to think that the old way of using Charset identifiers in fonts
worked a lot more cleanly for finding fonts matching a language/language
group. I would think this kind of core issue would be addressed more
cleanly by the font standard.
Actually it worked by
Elisha Berns asked:
It would appear from your answer that even after implementing the
algorithm to search the Unicode block coverage of a font, the actual
comparison data, that is which blocks to compare and how many code
points, is totally undefined. Is there any kind of standard for
On Thursday, June 26, 2003 8:16 PM, Elisha Berns [EMAIL PROTECTED] wrote:
It would appear from your answer that even after implementing the
algorithm to search the Unicode block coverage of a font, the actual
comparison data, that is which blocks to compare and how many code
points, is totally
At 10:09 AM 6/26/2003, [EMAIL PROTECTED] wrote:
The Meteg is a completely different issue. There is a small number
of places
were the Meteg is placed differently. Since it does not behave the same as
the regular Meteg, and is thus visually distinguishable, it should be
possible to add a
Doug, Peter, and Michael already provided good responses to
this suggestion by William O, but here is a little further
clarification.
Well, certainly authority would be needed, yet I am suggesting that where a
few characters added into an established block are accepted, which is what
is
At 14:32 -0400 2003-06-26, John Cowan wrote:
If you are going to discriminate (invidiously) using a computerized
database, using H for Handicapped (or G for Gimp) will do just as well.
Are you going to complain about the various symbols of religion already
encoded on the same grounds?
I am
How about RLM?
Jony
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Hudson
Sent: Thursday, June 26, 2003 6:36 PM
To: Jony Rosenne
Cc: [EMAIL PROTECTED]
Subject: SPAM: RE: Major Defect in Combining Classes of
Tibetan Vowels (Hebrew)
At
That may be what you see. Myself, every time I look at it, I see an orphaned
Hiriq without a consonant. It is normally placed in between the Lamed and
the Mem, to make certain the point isn't missed (a pun).
Jony
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Another consequence is that it separates the sequence into two
combining sequences, not one. Don't know if this is a serious problem,
especially since we are concerned with a limited domain with
non-modern usage, but I wanted to mention it.
Mark
__
Jony took the words right out of my mouth:
How about RLM?
Jony
This already belongs, naturally, in the context of the Hebrew
text handling, which is going to have to handle bidi controls.
Another possibility to consider is U+2060 WORD JOINER, the
version of the zero width non-breaking space
Peter responded:
Ken Whistler wrote on 06/25/2003 06:57:56 PM:
People could consider, for example, representation
of the required sequence:
lamed, qamets, hiriq, final mem
as:
lamed, qamets, ZWJ, hiriq, final mem
So, we want to introduce yet *another* distinct
At 02:45 PM 6/26/2003, Mark Davis wrote:
Another consequence is that it separates the sequence into two
combining sequences, not one. Don't know if this is a serious problem,
especially since we are concerned with a limited domain with
non-modern usage, but I wanted to mention it.
It is a serious
At 03:04 PM 6/26/2003, Kenneth Whistler wrote:
How about RLM?
This already belongs, naturally, in the context of the Hebrew
text handling, which is going to have to handle bidi controls.
Ouch. RLM is not expected to fall between combining marks. Not only does
this not render correctly,
At 15:36 -0700 2003-06-26, Kenneth Whistler wrote:
I now like better the suggestions of RLM or WJ for this.
ZZZT. Thank you for playing.
RLM is for forcing the right behaviour for stops and parentheses and
question marks and so on. Introducing it between two combining
characters in Hebrew
Ken wrote...
I now like better the suggestions of RLM or WJ for this.
I'll have to disagree with Ken. I'm not so sure about either of these. I
don't think anyone has, in the past, considered what conforming or
non-conforming behavior would be for a RLM or WJ between two combining
marks.
At 03:36 PM 6/26/2003, Kenneth Whistler wrote:
Why is making use of the existing behavior of existing characters
a groanable kludge, if it has the desired effect and makes
the required distinctions in text? If there is not some
rendering system or font lookup showstopper here, I'm inclined
to
At 03:52 PM 6/26/2003, Rick McGowan wrote:
I'll weigh in to agree with Ken here. The solution of cloning a whole set
of these things just to fix combining behavior is, to understate, not quite
nice.
No, but would be far from the not nicest thing in Unicode, and there's a
really good reason for
Michael wrote:
At 15:36 -0700 2003-06-26, Kenneth Whistler wrote:
I now like better the suggestions of RLM or WJ for this.
ZZZT. Thank you for playing.
RLM is for forcing the right behaviour for stops and parentheses and
question marks and so on. Introducing it between two
1. I agree with Ken about the current lack of precedent for Cfs before
combining marks. Interestingly, that we do have a proposal to do just
that, in
http://www.unicode.org/review/pr-9.pdf
However, note that the whole purpose of putting the Cf after the Ra is
to separate it from the halant, so
John,
At 03:36 PM 6/26/2003, Kenneth Whistler wrote:
Why is making use of the existing behavior of existing characters
a groanable kludge, if it has the desired effect and makes
the required distinctions in text? If there is not some
rendering system or font lookup showstopper here, I'm
John Hudson wrote:
At 03:52 PM 6/26/2003, Rick McGowan wrote:
I'll weigh in to agree with Ken here. The solution of cloning a whole set
of these things just to fix combining behavior is, to understate, not quite
nice.
No, but would be far from the not nicest thing in Unicode, and there's
51 matches
Mail list logo