On Tue, 11 Jun 2013, Stephan Stiller wrote:
How is the placement of vowel marks around ligatures
handled in Arabic text?
I'm also wondering how font designers normally handle this.
Older fonts in older operating systems (like Windows XP)
often failed. See
On Wed, 12 Jun 2013, Richard Wordingham wrote:
While the same principle applies to Indic scripts (and indeed, to the
Roman alphabet), there is only one Indic mark I can think of for which
the issue of component association arises, and that is the nukta.
Sanskrit requires candrabindu U+0901
On Thu, 7 Feb 2013, Raymond Mercier wrote:
I am using the full commercial Adobe Acrobat version 6, running on XP.
If there is more than one word, the order of words IS correct,
but the order of characters in each word is reversed.
(I don't know about your program.)
You can find out how
U+0305 Combining overline
U+0332 Combining low line
should both connect on left and right.
Which software (program and font) actually does this
when you overline/underline gh?
Test at
http://www.user.uni-hannover.de/nhtcapri/combining-marks.html
--
Outgoing mail is certified free from
On Tue, 16 Oct 2012, Jukka K. Korpela wrote:
... BabelPad ...
But not even ISO-8859-1.
The schoolboys working for Google are also too dumb
to process charset=ISO-8859-1 correctly.
For example, see
http://groups.google.com/group/sfnet.huuhaa/msg/467be25522963c61dmode=source
On Fri, 19 Oct 2012, I wrote:
http://groups.google.com/group/sfnet.huuhaa/msg/467be25522963c61dmode=source
Correction:
http://groups.google.com/group/sfnet.huuhaa/msg/467be25522963c61?dmode=source
On Sat, 6 Oct 2012, Bill Poser wrote:
Characters with a combining low line encoded as a single Unicode
codepoint are rendered correctly. Thus 's' followed by U+0332
is rendered as 's' followed by a low line, but U+1E95
LATIN SMALL LETTER Z WITH LINE BELOW is correctly rendered
with the
On Sun, 7 Oct 2012, Leonardo Boiko wrote:
Inspecting the Courier New font, version 5.11, I noticed that
the advance width of the glyph for U+0332 (glyph uni0331)
is 1129 units. I think this explains it all. The advance width
should be 0.
And other fonts have the same problem, at least the
On Mon, 8 Oct 2012, Jukka K. Korpela wrote:
http://www.user.uni-hannover.de/nhtcapri/combining-marks.html
Your test page is interesting, but is postulates the use
of style sheet switching,
You are always free to define your preferred font family
in your browser’s preferences, no? You may
On Wed, 5 Sep 2012, Petr Tomasek wrote:
Well, isn't Romanization a special case of transliteration?
Romanization of Chinese is certainly not a transliteration.
This holds for other scripts listed under
http://www.loc.gov/catdir/cpso/roman.html
as well.
On Fri, 17 Aug 2012, Jukka K. Korpela wrote:
There is an essential difference between using combining mark
and using a precomposed character:
...
In searches, for example, they do not match.
At least in Google, they match:
On Fri, 17 Aug 2012, Michael Everson wrote:
http://www.user.uni-hannover.de/nhtcapri/combining-marks.html
To change fonts quickly, choose among different style sheets
in your browser:
How? I'm using Safari.
If Safari doesn't let you select alternate stylesheets
then you can't change fonts
On Wed, 15 Aug 2012, Jameson Quinn wrote:
...
I'd like to see at least 20 glyphs for the (horizontal-barred) numerals.
...
Do others agree that it's needed?
Certainly not. Mayan numerals will disappear after 21 December 2012.
On Mon, 13 Aug 2012, Otto Stolz wrote:
http://www.machsmit.de/media/mainteaser/header-ichwillserleben.png
http://www.machsmit.de/kampagne/printmedien.php
show what the braindead German DIN keyboard layout has done to
the apostrophe (’): Killed by the acute accent (´).
Andreas’ example does
On Mon, 13 Aug 2012, Karl Pentzlin wrote:
The problem I am confronted with is that this character shares
its German name Raute with the #
I learnt in 7th grade what “Raute” means.
“#” is not a Raute.
The center field of “#” is called Raute or Rhombus.
BTW, Herr Pentzlin:
Is it correct that
U+0069 U+20D7
U+006A U+20D7
should have a dot and that
U+0131 U+20D7
U+0237 U+20D7
U+006B U+20D7
should have no dot?
On Wed, 1 Aug 2012, Kent Karlsson wrote:
Not sure why you include k here (which has no dot any which way)...
Just a little hint because my question might look too strange.
i, j, k with arrow are used in mathematics and physics
to denote the vectors (1,0,0) , (0,1,0) and (0,0,1) .
Sometimes I
To obtain the Bengali conjunct (ligature) tka,
I write
ta virama ka
U+09A4 U+09CD U+0995
This worked fine in Windows XP but it no longer works
with the fonts Shonar Bangla and Vrinda in Windows 7.
Is there an explanation?
On Sun, 20 May 2012, Michael Everson wrote:
- kh with *continuous* underline (romanization of U+0959) ?
No. Whose romanization is that?
http://www.loc.gov/catdir/cpso/romanization/hindi.pdf
http://homepage.ntlworld.com/stone-catend/trimain3.htm
On Sat, 19 May 2012, Michael Everson wrote:
The free Rupakara font, which was introduced to support
the INDIAN RUPEE SIGN when it was accepted for encoding,
has been updated to include the TURKISH LIRA SIGN.
See http://evertype.com/fonts/rupakara/
I don't see here:
- n with tilde, U+091E
-
On Sun, 20 May 2012, Michael Everson wrote:
I do not understand what it is you are after.
I meant:
Does your font include
- n with tilde (romanization of U+091E)
- kh with *continuous* underline (romanization of U+0959) ?
On Tue, 15 May 2012, announceme...@unicode.org wrote:
Recognizing the urgent need to support the new currency symbol in
information systems, the Unicode Consortium has scheduled its next
release, Unicode 6.2, for the third quarter of 2012.
That release will include the new character, U+20BA
On Wed, 16 May 2012, Denis Jacquerye wrote:
How about U+1E1C, U+1E1D
Hebrew U+05B1
U+1E4E, U+1E4F
I don't know.
U+1E64, U+1E65, U+1E66, U+1E67 ?
Hebrew U+FB2D and U+FB2C (in this order)
Which transliteration systems are they from?
ISO 259 (1984)
On Wed, 16 May 2012, Denis Jacquerye wrote:
U+1E00 and U+1E01 are also a mystery.
You can find letter a with ring below in the title
Grammar of the Pasto or language of the Afghans
by Ernest Trumpp, published 1873.
http://www.google.co.uk/search?q=%22P%E1%B8%81%E1%B9%A3%CC%8Ct%C5%8D%22
I don't
On Fri, 4 May 2012, Michael Probst wrote:
This is *not* about Verdana etc. but rather
http://www.hairetikos.info/afinalquestion.pdf
It seems to me that you have a problem with TeX, not with Unicode.
You should complain in a forum/mailing list dealing with TeX.
On Wed, 2 May 2012, Asmus Freytag wrote:
a document that not only describes the issues
but provides a suggested solution.
Suggested solution:
Correct the typefaces Comic Sans MS, Tahoma, Verdana
in the same way as the typeface Trebuchet MS has been corrected:
Make U+2018 a rotational image of
On Sun, 29 Apr 2012, Asmus Freytag wrote:
So, one of the most useful things that could come of the current
discussion, would be a thorough documentation of the glyph variations
needed to support both English and German for the same quotation mark
characters.
Actually, the case is quite
On Fri, 27 Apr 2012, Michael Probst wrote:
http://www.hairetikos.info/Ux2018_is_not_RIGHT_HIGH_6.pdf
This is well known. Please read this old thread:
http://unicode.org/mail-arch/unicode-ml/y2006-m06/thread.html#30
A few fonts from Microsoft/Monotype are broken:
On Mon, 16 Apr 2012, arno.s wrote:
U+1E96 has the note Semitic transliteration. Indeed U+1E96 to
U+1E9A are used for transliterating Arabic according to ISO 233.
w with ring is waw with sukun.
but *any* consonant occurs with sukun, so why did they not
encode b with ring, d with ring, d with
On Sun, 15 Apr 2012, David Starner wrote:
At Wiktionary, we're looking at (U+1E98) and we can't figure out
where it came from. It's from Unicode 1.1, which makes it hard to look
up discussion on adding it, and the characters around it don't seem to
give clues to its origin.
U+1E96 has the
I come back to
http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11
A similar problem of showing non-joining, isolated Arabic glyphs
can be seen in the attached file. Both Internet Explorer 8 and
MS Word 2010 display isolated glyphs in some cases.
I think a better idea is to
On Mon, 26 Mar 2012, Escape Landsome wrote:
In Arabic, when writing a LAM followed by an ALIF, you have a special
ligature of the two letters
Some (broken) fonts do not form the lam-alif ligature when you insert
some non-spacing mark between lam and alif:
Quoting upside-down, Philippe Verdy wrote:
It would help if you created such documents using numeric character
references in your source for all invisible characters and format
controls instead of inserting them litterally.
Monsieur Perdu:
Everybody except you understands that I have done
I think the zero-width joiner (ZWJ, U+200D) should join
regardless of typeface. But Internet Explorer 8 won't join
if the ZWJ is taken from another font than surrounding text.
In MS Windows, the font Mangal contains the zero-width joiner
but not Arabic letters. When I specify font-family: Mangal
Arabic letter U+0682 shows two dots above.
It has the cryptic remark not used in modern Pashto.
But was it ever used?
The new 2011 edition of German standard DIN 31635
Romanization of the Arabic Alphabet
http://www.beuth.de/en/standard/din-31635/140593750
shows the real archaic Pashto letter on
There is a non-standard alif-lam ligature in the Arabic script.
The logo of Al Arabiya shows an example.
Which fonts have such an alif-lam ligature?
Should I write U+0627 ZWJ U+0644 to obtain the ligature? Or
should I write U+0627 ZWNJ U+0644 to prevent the ligature?
Or is alif-lam outside the
On Wed, 19 Oct 2011, Mark E. Shoulson wrote:
interesting that the Latin examples have *compatibility* decompositions,
and the Hebrew/Yiddish digraphs don't even have that
Nevertheless, digraphs and separate letters are the same for Google:
There are three so-called Yiddish digraphs in Unicode:
U+05F0 wawayim
U+05F1 waw yod
U+05F2 yodayim
What is specifically Yiddish about these digraphs?
They can be used in the same way in Hebrew.
But this isn't done. Why not?
On Wed, 19 Oct 2011, Michael Everson wrote:
What is specifically Yiddish about these digraphs?
They are used in Yiddish orthography.
With digraphs:
http://yi.wikipedia.org/wiki/%EE%F9%E4_%EC%D6%E1_%F8%E0%E1%E9%F0%E0%D4%E9%E8%F9
Without digraphs:
On Mon, 17 Oct 2011, Eli Zaretskii wrote:
However, it could be that the confusion is mine, and it stems
from the fact that the logical order of these characters was not
stated by the OP.
You can read the source text, no?
On Mon, 17 Oct 2011, Eli Zaretskii wrote:
Btw, according to my testing, the current Firefox displays this
this is
http://www.unicode.org/mail-arch/unicode-ml/y2011-m10/att-0059/1999-12-31.html
as 31/12/1999.
Firefox 7 displays 1999/12/31.
I return to
http://www.unicode.org/mail-arch/unicode-ml/y2011-m10/att-0059/1999-12-31.html
Microsoft programs (Internet Explorer, MS Word), display this as
31/12/1999
Other programs (Firefox, Opera, OpenOffice) display this as
1999/12/31
NB:
I do not ask how to write unambiguously. (This
On Fri, 7 Oct 2011, Murray Sargent wrote:
The ASCII solidus is used in various nonmathematical contexts
(dates, alternatives)
It bothers me that different programs display
HTML
H1 dir=rtl align=center
#1633;#1641;#1641;#1641;/#1633;#1634;/#1635;#1633;
/H1
/HTML
differently.
On Tue, 11 Oct 2011, I wrote:
It bothers me that different programs display [...] differently.
Including HTML in messages as described in
http://www.hypermail-project.org/hypermail.html#6
didn't quite work.
Therefore I attach a tiny HTML file so that you can test
with different
On Tue, 11 Oct 2011, Peter Constable wrote:
It works flawlessly in Firefox (which is the only browser
to support it - Internet Explorer, Chrome and Safari don’t
support it. I don’t know for Opera).
I've scanned this thread and can't figure out what it is.
span lang=ru../span
is
On Fri, 7 Oct 2011, Gerrit wrote:
So if somebody from Google reads this,
[...]
Additionally, if the standard Android web browser could then
use the html “lang” tag to select the appropriate font,
it would be even nicer.
Mark Davis from Google has confessed on this list
On Tue, 16 Aug 2011, Philippe Verdy wrote:
Even Netscape 4 was able to display all symbols from
http://www.user.uni-hannover.de/nhtcapri/mathematics.html
correctly.
Yes, but probably not the last part of the table (displayed
on the page from the link labelled more...),
That is a
On Sun, 14 Aug 2011, Asmus Freytag wrote:
The Ohm sign should have been encoded as another example of squared
letters and abbreviations. It comes from Asian character sets,
I’d say the ohm sign comes from the MacRoman character set (0xBD).
On Fri, 12 Aug 2011, Leo Broukhis wrote:
http://www.numericana.com/about.htm
The author Gerard P. Michon is clueless.
Even Netscape 4 was able to display all symbols from
http://www.user.uni-hannover.de/nhtcapri/mathematics.html
correctly.
On Fri, 5 Aug 2011, Doug Ewell wrote:
UTF-8 has the property of being easily detected and verified
as such, which solves part of the Google Groups problem
(inability to detect which SBCS is being used).
No, it doesn't solve. The schoolboys working for Google are so dumb
that they even assume
On Fri, 5 Aug 2011, I wrote:
Example:
http://groups.google.com/group/sfnet.huuhaa/msg/4a7b0cae182e8c50
http://groups.google.com/group/sfnet.huuhaa/msg/4a7b0cae182e8c50dmode=source
Make that:
http://groups.google.com/group/sfnet.huuhaa/msg/4a7b0cae182e8c50?dmode=source
On Tue, 5 Jul 2011, Philippe Verdy wrote:
Even MS Word 2010 continues to use U+001F as soft hyphen
but does not recognize U+00AD as soft hyphen.
I've not spoken at all about U+001F and not even tested it
alt+0031
alt+0173
I have entered TRUE soft hyphens as U+00AD, in a plain-text
On Sun, 3 Jul 2011, Jukka K. Korpela wrote:
You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for
Windows 7) within a random long word (aa) and the SHY
is recognized to generate the intended hyphenation break.
That’s good news, if your analysis is correct, but the
On Fri, 1 Jul 2011, Peter Krefting wrote:
Not that it matters much, just something we noticed.
Peter Krefting - Core Technology Developer, Opera Software ASA
I noticed something that matters -- namely that Opera isn't
really fit to display bidirectional text and documents.
For example:
On Mon, 9 Aug 2010, Jukka K. Korpela wrote:
It is of course transliteration standards that should say something
normative about the matter. As far as I can remember, the authoritative
versions of the relevant standards are the paper publications, which
do no identify characters by Unicode
On Tue, 27 Jul 2010, Arno Schmitt wrote:
Since U+0649 is called alif maqsura it should be used for alif maqsura.
But that argument, you must use U+0027 for an apostrophe instead
of U+2019.
The Unicode names for characters are often hictorical and
you should not infer anything from such names.
On Tue, 27 Jul 2010, David Starner wrote:
MacArabic, Windows-1256 and ISO-8859-6 are all standards for
the encoding of Arabic. Thus U+0649 must be an Arabic character;
existing use in both those sets and in Unicode say that is.
By that circular logic, S with cedilla and T with cedilla
must be
On Tue, 27 Jul 2010, Khaled Hosny wrote:
According to Grammatik des klassischen Arabisch by Wolfdietrich Fischer,
page 9, the ya is written two dots in such cases, too.
Except that this is not a Yaa and not pronounced like a Yaa, it is an
Alef (note the small dagger Alef above it).
That is
On Wed, 28 Jul 2010, lingu...@artstein.org wrote:
Here's an arbitrary page from today's Al-Ahram newspaper,
[...]
On my computer this looks particularly jarring,
You can find enough pages from Continental Europe and Latin
America that have an acute accent instead of an apostrophe
due to
On Thu, 22 Jul 2010, lingu...@artstein.org wrote:
[...]
To wrap up, are my observations about the Pashto writing conventions
correct? And is there a standard for assigning the Pashto characters
representing /j/ and /i:/ to Unicode code points?
Practical answer:
U+0649 and U+064A are
On Tue, 6 Jul 2010, Shawn Steele wrote:
Often the author seems to use the same code page
they were expecting as a system default, so it can appear
to work for them even when it's wrong.
I am the author of this news message:
On Wed, 7 Jul 2010, Shawn Steele wrote:
however, in general, perhaps not your specific case,
the charset tag on the web cannot be 100% reliably trusted,
regardless of what the RFCs say.
You do not understand what I mean!
You have missed my point completely!
You DO NOT understand me!
On Thu, 1 Jul 2010, John Burger wrote:
If you have never encountered a web page in which the charset
parameter encoded in the page (or in the HTTP headers) did not
accurately reflect the real charset, as indicated by the actual
data in the page
How is it possible that you noticed that?
It's
On Mon, 28 Jun 2010, Mark Davis wrote:
I'll overlook the lack of civility, since I can understand
that kind of frustration when something doesn't work.
Well, I am aware of this problem/bug for many years now:
http://groups.google.co.uk/group/sci.lang/msg/eb55255e1925350f
Over the years I
On Thu, 24 Jun 2010, Leo Broukhis wrote:
a privilege (unique identity) available only to major currencies
like dollar, euro, pound, sterling and yen.
Even in the year 2010, the euro sign (€) doesn't work reliably.
--
From the New World:
On Fri, 25 Jun 2010, I wrote
Even in the year 2010, the euro sign (€) doesn't work reliably.
in both the Unicode list and in the newsgroup de.test.
unicode.org shows a euro sign:
http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html
groups.google.com shows a currency sign:
66 matches
Mail list logo