subject:"RE\: Devanagari"

Re: Devanagari and Subscript and Superscript

2015-12-16 Thread Doug Ewell

I missed this yesterday.

Plug Gulp wrote:

> General support for all characters, words and sentences could be
> achieved by just three new formatting characters, e.g. SCR, SUP and
> SUB, similar to the way other formatting characters such as ZWS, ZWJ,
> ZWNJ etc are defined. The new formatting characters could be defined
> as:
>
> SCR: In a character stream, all the characters following this
> formatting character shall be treated as [...]
>
> SUP: In a character stream, all the characters following this
> formatting character shall be treated as [...]
>
> SUB: In a character stream, all the characters following this
> formatting character shall be treated as [...]

This isn't similar to ZWSP or ZWJ or ZWNJ. Those formatting characters
are not stateful; they affect the rendering of, at most, the single
characters immediately preceding and following them.

The ones you suggest are stateful; they affect the rendering of
arbitrary amounts of subsequent data, in a way reminiscent of ECMA-48
("ANSI") attribute switching, or ISO 2022 character-set switching.
Unicode tries hard to avoid encoding such things.

--
Doug Ewell | http://ewellic.org | Thornton, CO 

Re: Devanagari and Subscript and Superscript

2015-12-16 Thread Philippe Verdy

2015-12-16 19:16 GMT+01:00 Doug Ewell :

> The ones you suggest are stateful; they affect the rendering of
> arbitrary amounts of subsequent data, in a way reminiscent of ECMA-48
> ("ANSI") attribute switching, or ISO 2022 character-set switching.
> Unicode tries hard to avoid encoding such things.


You can try as hard as you want, there are cases where it is impossible to
avoid stateful encoding if we want to avoid desunifications, or even for
some characters that cannot even work without stateful analysis.

And this is not solved just by style markup when that "style" is in fact
completely semantic. The situation must be taken into account with more
care :

- For example, the superscript Latin letter o, aka "ordinal masculine",
which is not just a superscript but a notation adding the semantics of a
abbreviation for the final letters, linked to the other letters before it,
the whole being semantically a single word: the superscript style does not
create such attachment, it creates a separate "word" inside it, so it was
disunified from the letter o.

- But it is not a good practive to encode in Unicode things that are just
styles without clear semantics (so encoding SUB/SUP is really a bad idea).

- On the opposite it is simply impossible to work with Egyptian hieroglyphs
as the default clusters are clearly insufficient to create ANY kind of
plain-text: you need extra markup to add the necessary semantic, not style,
and this markup should be encodable as plain-text without external markup
for the presentation when this presenation is fully semantic and clear
(e.g. the Egyptian "cartouche" for names of kings).
- Similar issue occur with SingWriting and other scripts that DO require
always a complex (non-linear) layout where basic clusters are clearly
insufficient in ALL texts, meaning that the characters that were encoded
are almost **useless** in all plain-text documents: you need extra "format"
characters to create some form of orthographic rule, independantly of the
style or from an external markup language.

I'm in favor of adding **semantic** format characters in Unicode, not
stylistic-only format characters, as soon as there does exist a wellknown
orthographic convention which whould work independantly of styling. But for
now the encoded format characters only work on too small clusters, clusters
are only linear and this is clearly not enough (even for instructing other
kinds of text analysis (such as breakers).

Then the renderers will be adapted and extended to work with more complex
clusters with their internal structures with simpler clusters parts). Other
renderers using the legacy rules will not be able to do that but will
attempt to render some basic fallback (possibly with special visible glyphs
for those controls).

One kind of semantic format character which is useful and encoded is the
"invisible parentheses" for mathematics, which can be encoded for example
after a radical sign: use them around a number to define the extension of
the radical to more than one digit (and make a clear visual and semantic
distinction between "sqrt(24)" and "sqrt(2)4" when you don't want to render
any parentheses, or making the distinction between "sqrt(2+sqrt(3))" and
"sqrt(2)+sqrt(3)").

Re: Devanagari and Subscript and Superscript

2015-12-15 Thread Doug Ewell

Plug Gulp wrote:

> It will help if Unicode standard itself intrinsically supports
> generalised subscript/superscript text.

This falls outside the scope of "plain text" as defined by Unicode, in
much the same way as bold and italic styles and colors and font faces
and sizes.

There are several rich-text formats besides HTML that support arbitrary
subscript and superscript text. PDF and Word leap to mind.

--
Doug Ewell | http://ewellic.org | Thornton, CO 

Re: Devanagari and Subscript and Superscript

2015-12-15 Thread srivas sinnathurai

Does the standard support the use of diacritics in plain text format, when used
with all and any complex scripts?

Regards

Sinnathurai

> 
> On 15 December 2015 at 17:46 Doug Ewell  wrote:
> 
> 
> Plug Gulp wrote:
> 
> > It will help if Unicode standard itself intrinsically supports
> > generalised subscript/superscript text.
> 
> This falls outside the scope of "plain text" as defined by Unicode, in
> much the same way as bold and italic styles and colors and font faces
> and sizes.
> 
> There are several rich-text formats besides HTML that support arbitrary
> subscript and superscript text. PDF and Word leap to mind.
> 
> --
> Doug Ewell | http://ewellic.org | Thornton, CO 
> 
> 

>

RE: Devanagari and Subscript and Superscript

2015-12-15 Thread Doug Ewell

srivas sinnathurai wrote:

> Does the standard support the use of diacritics in plain text format,
> when used with all and any complex scripts?

It probably depends on what you mean by "support" and "diacritics." I
can type a Tamil letter followed by a combining acute accent or
diaeresis, and in Arial Unicode MS it actually looks halfway decent.
Many years ago, William Overington famously put a combining circumflex
on top of U+2604 COMET. You just type one character followed by another
and hope for the best, display-wise. You don't get any other special
behavior.

I'm not sure if this was supposed to be a comment on my statement that
arbitrary subscript and superscript is similar to other attributes that
are not defined to be part of plain text.

--
Doug Ewell | http://ewellic.org | Thornton, CO 

Re: Devanagari and Subscript and Superscript

2015-12-15 Thread Plug Gulp

On Wed, Dec 9, 2015 at 5:18 AM, Martin J. Dürst wrote:
>
> I suggest using HTML:
>
> बक ्ष
>

This will work only if the end-users are always going to use a web
browser to view the text content.

It will help if Unicode standard itself intrinsically supports
generalised subscript/superscript text. I think the meaning of the
text should be contained within the text itself rather than relying on
external text markers and viewers. That way the text-content creator
does not have to rely on what type of unicode compliant text viewer or
editor the end user is using. The text should retain it's meaning
irrespective of the type of unicode compliant text viewer or editor
used. Similarly, if the text has to be saved in a database without
losing it's meaning, then either it has to be saved with all the known
markers of all the available editors, or some special processing needs
to be incorporated to convert some saved marker to markers of various
available text viewers and editors. Having generalised Unicode support
for superscript and subscript will solve all these problems.

Following is one of the use-cases where general Unicode support for
superscript/subscript will help tremendously:

A math teacher(गणिताचे शिक्षक) in a Marathi(मराठी) language school is
writing notes, in her Unicode compliant plain text editor, to explain
mathematical terms to her students. Following is an excerpt from the
notes that explains terms such as exponents(घातांक) and base(पाया).
(English translation is given below):

"जेव्हा एखाद्या संखेचा स्वतःशीच अनेक वेळा गुणाकार होतो तेव्हा त्या
गुणाकाराला थोडक्यात लिहिण्याच्या पद्धतीला घातांक असे म्हणतात.
उदाहरणार्थ, ५ ही संख्या जर स्वतःशी ३ वेळा गुणली जात असेल, म्हणजे ५ x ५
x ५, तर त्याला घातांक पद्धतीत ५^३ असे लिहितात. ह्या घातांकीय रचनेला "५
चा ३ रा घात" असे म्हणतात. आपण अजून एक उदाहरण घेऊया, "२ ना चा १० वा
घात", म्हणजे २ ही संख्या स्वतःशी १० वेळा गुणली गेली आहे. ह्याला आपण
२^१० असे लिहितो. तर साधारणपणे, कूठलीही संख्या ब जेव्हा स्वतःशी क्ष
वेळा गुणलीजाते तेव्हा त्याला घातांक पद्धतीत ब^क्ष असे लिहितात, आणि
त्या रचनेला "ब चा क्ष वा घात" असे म्हणतात. इथे ब ह्या संखेला पाया
म्हणतात आणि क्ष ह्या संखेला घात असे म्हणतात. तर थोडक्यात, घातांकीय
रचनेला पाया^घात असे लिहितात."

English translation:
"Exponent is a shorthand notation that denotes a multiplication of a
number by itself a number of times. For example, if a number 5 is
multiplied by itself 3 times i.e. 5 x 5 x 5, then it is represented in
an exponential form as 5^3. This exponential term is referred to as "5
raise to the power of 3". Let us consider another example, "2 raise to
the power of 10", i.e. 2 is multiplied by itself 10 times. This is
written in exponential form as 2^10. So, in general any number b that
is multiplied by itself k number of times is written as b^k and the
term is referred to as "b raise to the power of k". The number b is
called the base, and the number k is called the exponent. In short,
exponential term is written as base^exponent."

Please note that the teacher had to use a Circumflex Accent (Caret) to
indicate superscript, which is an unwritten convention, in the absence
of proper superscript support within Unicode. To make the text
available to wider audience and still retain it's meaning, the teacher
will have to partly rely on Unicode support, partly on the markers
available in the various text viewers of her students, partly on the
markers available in the text editors of the peer-reviewers of her
text and partly on the unwritten convention(such as the caret). This
conundrum can be resolved only if there is a generalised support for
superscript and subscript within Unicode standard.

The standard already has a section for superscript and subscript.
Generalising and extending this support will help other languages and
scripts. General support for all characters, words and sentences could
be achieved by just three new formatting characters, e.g. SCR, SUP and
SUB, similar to the way other formatting characters such as ZWS, ZWJ,
ZWNJ etc are defined. The new formatting characters could be defined
as:

SCR: In a character stream, all the characters following this
formatting character shall be treated as normal text until either the
end of the character stream or the next SUP or SUB character is
reached. This shall be the default marker i.e. if no marker is
specified then the text shall be treated as normal text until either
the end of the character stream or the next SUP or SUB character is
reached.

SUP: In a character stream, all the characters following this
formatting character shall be treated as superscript text until either
the end of the character stream or the next SCR or SUB character is
reached.

SUB: In a character stream, all the characters following this
formatting character shall be treated as subscript text until either
the end of the character stream or the next SCR or SUP character is
reached.

A general support within Unicode for subscripting and superscripting

Re: Devanagari and Subscript and Superscript

2015-12-15 Thread Richard Wordingham

On Tue, 15 Dec 2015 18:00:16 + (GMT)
srivas sinnathurai  wrote:

> Does the standard support the use of diacritics in plain text format,
> when used with all and any complex scripts?

Relatively few scalar value sequences are prohibited - just possibly
sequences containing unassigned characters that are not
non-characters, but I can't think of any others.  (The
prohibition on unpaired surrogates applies to coded character
sequences, but surrogate characters aren't scalar values.) 

It would appear by Conformance Requirement C5, 'A process shall not
assume that it is required to interpret any particular coded character
sequence', that a process is at liberty to decline to interpret a
sequence of scalar values, even if it has just interpreted it.

I am not aware of any requirements in the standard to interpret
specific character sequences.

In general, the interpretation of character sequences is undefined.
For example, a request for advice on the interpretation of
the combination of U+0331 COMBINING MACRON BELOW and U+0E39 THAI
CHARACTER SARA UU was answered with the instruction to consult the
non-existent typographical tradition.  It's been left to rendering
engine writers to define the interpretation.

Indeed, I am not sure that every sequence of defined scalar values
has an interpretation.  Most pairs of regional indicators don't have an
interpretation, and the interpretation of each variation sequences may
change at least twice, once when the base character becomes defined
(or is defined not to be a possible base character), and again when
the variation sequence is assigned an interpretation as an ill-defined
(or grossly ill-defined) family of glyphs.

Do U+0337 COMBINING SHORT SOLIDUS OVERLAY and U+20E5 COMBINING REVERSE
SOLIDUS OVERLAY have a defined interpretation when their base character
is to be represented by a mirrored glyph.  Note that in general, the
Unicode standard does not define when a character is to be represented
by a mirrored glyph.  This may be defined by a lower level protocol
(the font file).

Richard.

Re: Devanagari and Subscript and Superscript

2015-12-15 Thread Khaled Hosny

On Tue, Dec 15, 2015 at 11:55:02AM +, Plug Gulp wrote:
> Please note that the teacher had to use a Circumflex Accent (Caret) to
> indicate superscript, which is an unwritten convention, in the absence
> of proper superscript support within Unicode.

If the teacher is explaining actual math to his students, then the
superscript is the least of his worries.

Math typesetting is two dimensional, and is much more complex than
regular formated text (not even regular plan text)that it needs its own
typesetting engines.

There are various plain text markup languages to markup math, if one
really wants to represent complex mathematical notation in plain text.

Regards,
Khaled

Re: Devanagari and Subscript and Superscript

2015-12-11 Thread Richard Wordingham

On Wed, 9 Dec 2015 03:24:39 +
Plug Gulp  wrote:

> I am trying to understand if there is a way to use Devanagari
> characters (and grapheme clusters) as subscript and/or superscript in
> unicode text.

Why do you want to do this?  Are you asking about writing Devanagari
vertically rather than horizontally?  If that is what you want, you
should be looking at mark-up such as is found in cascading style sheets
(CSS).  It is an important issue for CJK and Mongolian, and there have
been questions as to what is needed for Indian scripts.  (There's also
an antiquarian interest for historical scripts, such as Phags-pa and
even Egyptian - moves are afoot to support the hieroglyphic script as
plain text.)

Richard.

Re: Devanagari and Subscript and Superscript

2015-12-08 Thread Richard Wordingham

On Wed, 9 Dec 2015 03:24:39 +
Plug Gulp  wrote:

> Hi,
> 
> I am trying to understand if there is a way to use Devanagari
> characters (and grapheme clusters) as subscript and/or superscript in
> unicode text.

The view is that such would not be 'plain text', and therefore need not
be catered for in Unicode.  On the other hand, the desire for
spacing raised and lowered characters is sufficient that markup to
produce them is widely available, as Martin Dürst pointed out.

Non-spacing stacked characters are not common enough for general
support to be available.  In many Indic scripts, stacking is the normal
arrangement, and is supplied via a script-specific special character
that is overloaded with a vowel cancellation symbol.  However,
font-specific deviations from vertical stacking are arranged, and
vowels marks are treated independently.  There is no provision for
vertical stacks to have horiziontal offshoots.  (Scripts written
vertically are a different case.)

For characters stacked directly above and below not in the normal
modern fashion of writing words, there can be special characters for
special cases.  For example, there are U+A8EE COMBINING DEVANAGARI
LETTER PA in the Devanagari Extended block and U+0364 COMBINING LATIN
SMALL LETTER E.

Other, clumsier scheme-specific techniques are available other cases.
See for example the writing of nuclides with an explicit atomic number
in https://en.wikipedia.org/wiki/Nuclide.  The notation needs a mass
number at top left and an atomic number at bottom right.

A fairly general case is the annotation of kanji known as 'ruby'.
Sometimes an application or mark-up scheme will support this directly.

Richard.

Re: Devanagari and Subscript and Superscript

2015-12-08 Thread Martin J. Dürst


Hello Plug,

I suggest using HTML:

बक ्ष

Regards,   Martin.

On 2015/12/09 12:24, Plug Gulp wrote:

Hi,

I am trying to understand if there is a way to use Devanagari
characters (and grapheme clusters) as subscript and/or superscript in
unicode text. It will help if someone could please direct me to any
document that explains how to achieve that. Is there a unicode marker
that will treat the next grapheme cluster in the unicode text as
super/subscript? For e.g. if one wants to represent "ब raise to क्ष"
how does one achieve that; is there a marker to represent it as
follows: ब + SUP + क + ् + ष
where SUP acts as a marker for superscripting the next grapheme
cluster. Similar for subscripting.

Sorry if this is not the right place to ask this question; in that
case please could you direct me to the right forum?

Thanks and kind regards

~Plug

.

RE: Devanagari Letter Short A

2004-02-19 Thread Aparna A. Kulkarni

The character U+0904 (DEVANAGARI LETTER SHORT A) is not a part of ISCII 91.
Neither was it encoded in any of the earlier versions of ISCII. Hence
according to the ISCII standard this character simply cannot be formed.

Aparna A. Kulkarni

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Ernest Cline
Sent: Monday, February 16, 2004 10:59 AM
To: Unicode List
Subject: Devanagari Letter Short A

I've been trying to make sense of the Indian scripts, but am
having one small difficulty.  I can't seem to find the ISCII 1991
equivalent for U+0904 (DEVANAGARI LETTER SHORT A).

Is this a character that is part of the set accessed by the
extended code (xF0) or was this part of the ISCII 1988
standard that did not survive the changes to ISCII 1991?

Alternatively, does ISCII encode this as xA4 + xE0 as this
would seem to generate the proper glyph even tho it
violates the syllable grammar given in Section 8 of ISCII?

Or even more alternatively, am I just missing something
that should be obvious, but which  for some reason I can't see?
Even with the slight differences in the naming conventions
between ISCII and Unicode, I don't seem to be misplacing
any of the other vowels or consonants.

Ernest Cline
[EMAIL PROTECTED]

Re: Devanagari Letter Short A

2004-02-19 Thread Philippe Verdy

From: Aparna A. Kulkarni [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; 'Unicode List' [EMAIL PROTECTED]
Sent: Thursday, February 19, 2004 8:23 AM
Subject: RE: Devanagari Letter Short A

 The character U+0904 (DEVANAGARI LETTER SHORT A) is not a part of ISCII 91.
 Neither was it encoded in any of the earlier versions of ISCII. Hence
 according to the ISCII standard this character simply cannot be formed.

 Aparna A. Kulkarni

So could this character exist only for the purpose of supporting languages that
are not covered by ISCII but that share the same Devanagari script, and is then
needed for other countries than India?

(Here I think about Dravidian transiptions).

If there's no ISCII standard related to its meaning or encoding, then what is
invalid when coding it with LETTER A then the LETTER SHORT E vowel modifier,
possibly with an intermediate INV or other ISCII-compatible control? How would
this break ISCII compatibility?

Aren't there existing practices to represent LETTER SHORT A in ISCII?

Re: Devanagari Letter Short A

2004-02-18 Thread Antoine Leca

Philippe Verdy va escriure:
 
 U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an
 independant vowel. It can be viewed as a conjunct of the
 independant vowel U+0905 DEVANAGARI LETTER A and the dependant
 vowel sign U+0946 DEVANAGARI VOWEL SIGN SHORT E (noted for
 transcribing Dravidian vowels in the Unicode charts).

You may regard it this way, but that is not so.
U+0905 followed by U+0946 is really U+090E. Compare with the other
scripts to understand why.

 I  don't know why this is not documented, because I can find various
 sources that use U+0904 or U+0905,U+0946 which have exactly the
 same rendering and probably the same meaning and usage.

Whow! You have various sources that use a character added to Unicode
about 2 years and half ago! Impressionnant!

About the rendering of U+0905,U+0946, since it violates the usual
rules, it is up to your system. Mine does not render it properly,
though (unless I cheat).

 I think that U+0946 was added in ISCII 1991 but was absent from ISCII
 1988

No. It was there even in ISCII 83.

 (I think it's too late to define it: ISCII 1988 has been used 
 consistently before,

H... I have really no evidence that ISCII 1988 was used at all...
Would be happy to find one, though...


Antoine

Re: Devanagari Letter Short A

2004-02-18 Thread Antoine Leca

Ernest Cline wrote:
 
 I've been trying to make sense of the Indian scripts, but am
 having one small difficulty.  I can't seem to find the ISCII 1991
 equivalent for U+0904 (DEVANAGARI LETTER SHORT A).

I do not believe you'll find it there.
U+0904 had been added to Unicode for version 4.0. In 2001.
URL:http://www.unicode.org/consortium/utc-minutes/UTC-089-200111.html
Search for 89-C19.


 Is this a character that is part of the set accessed by the
 extended code (xF0) or was this part of the ISCII 1988
 standard that did not survive the changes to ISCII 1991?

No and no.

 
 Alternatively, does ISCII encode this as xA4 + xE0 as this
 would seem to generate the proper glyph even tho it
 violates the syllable grammar given in Section 8 of ISCII?

It does not. At the very least, if you want to generate this
character in ISCII this way, try A4 DB E0 (using INV).
This is an ugly hack, of course.

As an aside, in some version of ISCII (EA-ISCII, notably),
A4 E0 is supposed to be equivalent to AD. This is the way
the alphabet is sometimes taught to children in India.

 
Antoine

Re: Devanagari Letter Short A

2004-02-16 Thread Philippe Verdy

My understanding of the Indian scripts coded in Unicode, is that the mapping
from ISCII to Unicode is not straightforward one-to-one, because ISCII uses a
contextual encoding for characters (allowing shifts between several scripts) and
some rich-text features.

The ISCII character model is not exactly the same as the Unicode character
model, even though there was an attempt to make this mapping as simple as
possible by allocating the Unicode code points for each individual
ISCII-supported script in the same relative order, leaving gaps in the
Unicode-encoded scripts for ISCII characters that are not used in one specific
script.

The good reference for how Indian scripts are coded in Unicode is Chapter 9 of
the Unicode 4 reference:
http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf
In summary with Unicode, the model for Devenagari:
- uses consonnantal letters with an implied (default) vowel A, modified by the
next coded dependant vowel sign (matra) that create graphic conjuncts with the
consonnant, or
- uses half-forms of consonnants to drop the implied vowel in initial
consonnants, or
- uses a virama (halant) U+094D, to mark other omissions of the implied vowel on
dead consonnant letters (most often on final consonnants, but this occurs as
well on initial or medial consonnants), by removing the final stem of the full
(live) consonnant that is normally used to depict also a phonetic syllable
boundary with a necessary vowel. So the virama allows creating conjuncts with
other following dead consonnants or live consonnants, and normally attaches both
consonnant letters into the same syllable or conjunct.
- in some cases, the omission of the implied dependant vowel must not create a
ligated conjunct, so the virama still needs to represent the omission of the
vowel without creating a conjunct that would break the perceived phonetic, and a
ZWNJ is used between the dead consonnant (consonnant letter+virama) and the next
live consonnant.

There's a U+0905 pseudo-consonnant /a/ which is used in absence of a phonetic
consonnant, but it follows the same encoding rule as other consonnant letters
/*a/, i.e. coding another isolated vowel requires coding /a/ before the vowel
sign (matra). This encodes approximately the same thing as isolated vowels,
except that the intended rendering is different.

U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an independant
vowel. It can be viewed as a conjunct of the independant vowel U+0905
DEVANAGARI LETTER A and the dependant vowel sign U+0946 DEVANAGARI VOWEL SIGN
SHORT E (noted for transcribing Dravidian vowels in the Unicode charts). I
don't know why this is not documented, because I can find various sources that
use U+0904 or U+0905,U+0946 which have exactly the same rendering and
probably the same meaning and usage. I think that U+0946 was added in ISCII 1991
but was absent from ISCII 1988 (verify, I don't have the ISCII 1988 reference
document), so U+0904 has survived just to allow a mostly one-to-one mapping with
ISCII 1988. But the addition of U+0946

May be I'm wrong here, and there's some reasons for this choice. there's no
canonical or compatibility equivalence defined between U+0904 and
U+0905,U+0946 (I think it's too late to define it: ISCII 1988 has been used
consistently before, and the Unicode stability policy forbids now defining now
new equivalences between them).

- Original Message - 
From: Ernest Cline [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Monday, February 16, 2004 6:28 AM
Subject: Devanagari Letter Short A


 I've been trying to make sense of the Indian scripts, but am
 having one small difficulty.  I can't seem to find the ISCII 1991
 equivalent for U+0904 (DEVANAGARI LETTER SHORT A).

 Is this a character that is part of the set accessed by the
 extended code (xF0) or was this part of the ISCII 1988
 standard that did not survive the changes to ISCII 1991?

 Alternatively, does ISCII encode this as xA4 + xE0 as this
 would seem to generate the proper glyph even tho it
 violates the syllable grammar given in Section 8 of ISCII?

 Or even more alternatively, am I just missing something
 that should be obvious, but which  for some reason I can't see?
 Even with the slight differences in the naming conventions
 between ISCII and Unicode, I don't seem to be misplacing
 any of the other vowels or consonants.

 Ernest Cline
 [EMAIL PROTECTED]

Re: Devanagari Glottal Stop

2003-04-06 Thread Michael Everson

I wrote:

  I would have to disagree with these Indian experts in this instance.
 The Devanagari glottal stop does not have a dot, and indeed, in the
 languages which use it, this character will certainly coexist with
 the question mark. They have different shapes, and different
 functions.
At 15:03 -0800 2003-04-05, Mark Davis wrote:
Can you respond back to them with the information as to the 
languages involved?
I believe they read the Unicore list, don't they, Mark? N2543 and 
02/394 show the character used for the Limbu language, and shows the 
glyph without a dot and with a horizontal headbar, which the question 
mark never has. (It also shows an example where, because the 
typesetters didn't have the letter available they substituted a 
question mark, but that just goes to show that we need to encode 
this, because it is a letter, not a punctuation mark.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Devanagari Glottal Stop

2003-04-05 Thread Michael Everson

I would have to disagree with these Indian experts in this instance. 
The Devanagari glottal stop does not have a dot, and indeed, in the 
languages which use it, this character will certainly coexist with 
the question mark. They have different shapes, and different 
functions.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Devanagari Glottal Stop

2003-04-05 Thread Mark Davis

Can you respond back to them with the information as to the languages
involved?

Mark
(  )

[EMAIL PROTECTED]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, April 05, 2003 01:45
Subject: Re: Devanagari Glottal Stop


 I would have to disagree with these Indian experts in this instance.
 The Devanagari glottal stop does not have a dot, and indeed, in the
 languages which use it, this character will certainly coexist with
 the question mark. They have different shapes, and different
 functions.
 --
 Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Devanagari

2002-12-03 Thread Alan Wood

Vipul Garg wrote:

 I have downloaded your font chart for Devanagari, which is in the range
 from 0900 to 097F. I have also installed the Arial Unicode font supplied
 by Microsoft office XP suite. I found that not all characters are
 available for Devanagari. For example letters such as Aadha KA, Aadha KHA,
 Aadha GA etc. are not available. 
  
 These letters are required in the devanagari words such as KANYA, NANHA,
 PARMATMA etc.
  
 If you could provide the above letters then our requirement for formation
 of Devanagari words would be possible. This requirement is very crucial as
 we have a large volume project on Devanagari language involving data
 storage in Oracle database.
 
You could try using a different font, for example one of the specialist
Devanagari fonts listed at:

http://www.alanwood.net/unicode/fonts.html#devanagari

Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names)

RE: Devanagari

2002-12-03 Thread Andy White


Vipal Garg was asking why half characters were not included in Unicode
code charts and in his copy of Arial Unicode font.
 
More recent versions of Arial Unicode Do contain half characters etc.
for Devanagari.
As to the code charts, to answer this, you needed to explore the Unicode
web site a bit more to find the answer.  Please see the following for
detailed information regarding the half characters etc:
http://www.unicode.org/unicode/standard/where/
http://www.unicode.org/unicode/faq/indic.html
http://www.unicode.org/unicode/uni2book/ch09.pdf

Best Regards
Andy

You Wrote:
I have downloaded your font chart for Devanagari, which is in the range
from 0900 to 097F. I have also installed the Arial Unicode font supplied
by Microsoft office XP suite. I found that not all characters are
available for Devanagari. For example letters such as Aadha KA, Aadha
KHA, Aadha GA etc. are not available. 
 
These letters are required in the devanagari words such as KANYA, NANHA,
PARMATMA etc.

RE: Devanagari

2002-12-03 Thread Marco Cimarosti

Vipul Garg wrote:
 I have downloaded your font chart for Devanagari, which is in 
 the range from 0900 to 097F. I have also installed the Arial 
 Unicode font supplied by Microsoft office XP suite. I found 
 that not all characters are available for Devanagari. For 
 example letters such as Aadha KA, Aadha KHA, Aadha GA etc. 
 are not available. 
  
 These letters are required in the devanagari words such as 
 KANYA, NANHA, PARMATMA etc.
  
 If you could provide the above letters then our requirement 
 for formation of Devanagari words would be possible. This 
 requirement is very crucial as we have a large volume project 
 on Devanagari language involving data storage in Oracle database.
  
 Would appreciate an early reply.

Please, see document Where is my character:

http://www.unicode.org/unicode/standard/where/

Also have a look to question 17 in the Indic FAQ:

http://www.unicode.org/unicode/faq/indic.html#17

All is explained in more detail in Section 9.1 Devanagari of the Unicode
manual:

http://www.unicode.org/unicode/uni2book/ch09.pdf

Regards.
M.C.

1 2 >

1 - 100 of 103 matches

Mail list logo