21.12.2016, 4:29, Martin Mueller wrote:
Is there a Unicode character that says “I represent an alphanumerical
character, but I don’t know which”.
I think including such a “character” in Unicode would not fit into the
the idea of Unicode as a system for encoding plain text characters. You
6.10.2016, 19:27, Ken Whistler wrote:
Their functions have been completely overtaken by markup conventions
such as ... and ..., which *are* widely supported
already, even in most email clients, ri^ght out of the b_ox .
They are widely supported, but very widely in a typographically inferior
6.10.2016, 17:55, Frédéric Grosshans wrote:
Le 06/10/2016 à 09:21, Marcel Schneider a écrit :
I did never see that. Would you show us some examples to look up? Iʼm
curious
whether they could be managed without accented superscripts.
Anyway, combining diacritics should be placeable on
3.10.2016, 20:40, Leonardo Boiko wrote:
Besides, there are already control/formatting characters for such
purposes – several ones, even. They look like this: , ^{},
\textsuperscript{}, \*{ \*} …
They are not control or formatting characters. They are markup used at
higher protocol levels –
1.10.2016, 11:29, Khaled Hosny wrote:
On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote:
[...]
>> What I was pointing at was that when using
rich text or markup, it is complicated or impossible to have typographically
correct glyphs used (even when they exist), whereas t
30.9.2016, 19:36, Philippe Verdy wrote:
2016-09-30 17:54 GMT+02:00 Jukka K. Korpela <jkorp...@cs.tut.fi
<mailto:jkorp...@cs.tut.fi>>:
Using HTML, for example, the way to achieve that at present would be
to use markup like ... (to avoid the
problems caused by the defaul
30.9.2016, 19:11, Leonardo Boiko wrote:
The Unicode codepoints are not intended as a place to store
typographically variant glyphs (much like the Unicode "italic"
characters aren't designed as a way of encoding italic faces).
There is no disagreement on this. What I was pointing at was that
30.9.2016, 18:19, Philippe Verdy wrote:
Note also that many tools generating documentation from source code
allow you to insert HTML comments, so you could as well use ,
Yes, but there’s a serious typographic pitfall with this, as well as
with using e.g. subscript or superscript formatting
30.9.2016, 12:57, Gael Lorieul wrote:
I wonder why only a subset of the alphabet is available as subscript
and/or superscript ?
This is explained in section 22.4 of the standard:
http://www.unicode.org/versions/Unicode9.0.0/ch22.pdf#page=25
To put it briefly, in my interpretation, subscript
28.10.2015, 11:59, Rafael Sarabia wrote:
I need to use a document both in Word 2007 for Windows and Word 2011 for
Mac and I'm finding some incompatibility issues.
Before going into the details of plain text file encodings, I think it
is important to decide whether you need to use plain text
2014-12-18, 12:31, Andrea Giammarchi wrote:
I wonder if it's by accident that 00AE, 00A9, and 2122 are not listed
as standard variant sensitive chars.
Why would that be an accident any more than not listing 100,000 other
characters there? Or to put it more constructively, why should they
2014-10-24 15:05, Shriramana Sharma wrote:
Hi Martin. If you haven't noticed it before, opening Unicode charts in
PDF readers has something like SECURED on the top i.o.w. the charts
are sorta DRM-protected. So you can't copy-paste the characters. Heck
you can't even copy-paste the character
2014-08-15 1:52, Peter Constable wrote:
For those interested, there is an update for Windows available now to
add font, keyboard and locale data support for the Ruble sign that was
added in Unicode 7.0. For details, see here:
http://support.microsoft.com/kb/2970228
The update seems to have
2014-07-02 6:10, James Clark wrote:
The unalom is widespread in Thailand. For example, the Thai Red Cross
Society was originally founded as the Red Unalom Society, and its logo
was a red Unalom combined with a cross. It forms the main component of
the seal of Rama I (founder of the current Thai
2014-07-02 20:34, Philippe Verdy wrote:
CGJ would be better used to prevent canonical compositions but it won't
normally give a distinctive semantic.
In the question, visual difference was desired. The Unicode FAQ says:
“The semantics of CGJ are such that it should impact only searching and
2014-07-02 19:11, Leo Broukhis wrote:
Here
https://upload.wikimedia.org/wikipedia/commons/a/a4/Contrastive_use_of_kratka_and_breve.JPG
is an example of й and и + U+0306 COMBINING BREVE used contrastively
(/j/ vs short /i/) thanks to a difference in typographic style of
Cyrillic breve (kratka)
2014-06-29 21:44, Koji Ishii wrote:
The spec currently has the following text[2]:
Control characters (Unicode class Cc) other than tab (U+0009), line
feed (U+000A), and carriage return (U+000D) are ignored for the
purpose of rendering. (As required by [UNICODE], unsupported
Default_ignorable
2014-06-30 0:48, David Starner wrote:
On Sun, Jun 29, 2014 at 2:02 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:
They might be seen as “not displayable by normal rendering”, so yes. On the
practical side, although Private Use characters should not be used in public
information interchange
2014-06-04 15:32, Hans Aberg wrote under Subject: Re: Swift:
On 4 Jun 2014, at 13:58, Leonardo Boiko leobo...@namakajiri.net
wrote:
I don't think this feature saw much use, since programmers in a
global world can't assume that everyone will have easy access to
their input methods, and so tend
2014-06-04 17:42, Ian Clifton wrote:
Jukka K. Korpela jkorp...@cs.tut.fi writes:
As an aside, the ISO 8-2 standard on mathematical notations
describes boldface letters such as boldface R as symbols for commonly
known sets of numbers. The double-struck letters like ℝ as mentioned
2014-06-04 20:15, Andre Schappo wrote:
Well because outside of groups like this there is still little awareness
of Unicode, little understanding of Unicode, little willingness to use
Unicode and little conscious usage of Unicode
That’s very true. In the specific case of “using Unicode” (which
2014-06-03 19:13, Asmus Freytag wrote:
Unicode normally does not document all known usages of symbols.
Not to mention unknown usages. Characters will be used in different
ways, no matter what the Unicode Standard says, and it would be mostly
pointless to put restrictions on it. In some
2014-04-02 21:56, Whistler, Ken wrote:
U+23AF is *definitely* not a variation selector at all.
It is part of a set of bracket pieces (and other graphic pieces)
in the range U+239B..U+23B1.
[…]
These glyphic pieces of symbols are only relevant and useful
in the context of mathematical
2014-03-29 13:01, Asmus Freytag wrote:
On managing some types of spacing between elements in running text:
On 3/27/2014 8:04 AM, Jukka K. Korpela wrote:
[…]
The “fixed-width spaces” are mostly just legacy characters, holdover
from old typography. They may have their uses, though, in contexts
2014-03-27 10:13, William_J_G Overington wrote:
Does regular Unicode have a character that looks like a space to a
human yet is not treated as a space by software please?
It depends, among other things, on what you mean by “space”.
There’s U+00A0 NO-BREAK SPACE, which surely isn’t the same
2014-03-27 15:10, Kalvesmaki, Joel wrote:
William, try the U+2000..U+200A glyphs under General Punctuation--I think
that's what you're looking for to manage precise widths of blank space.
That range contains some “fixed-width spaces”, yes. Being “fixed-width”
is rather relative here, though,
2014-02-10 21:49, Richard Wordingham wrote:
U+200B has the distinct advantage of being a character, and therefore
readily travelling with the words it separates.
Granted, but it’s still a character that the rendering software needs to
know and support in order to have the desired effect. As
2014-02-10 22:30, Philippe Verdy kirjoitti:
No I make no confusion: wbr is a formatting HTML element, SHY (or
shy; in HTML syntax for the defined entity) is a character. Both play
equivalent roles in HTML,
Not at all.
except that shy; has a defined behavior to
insert an hyphen at end of
2014-02-10 9:13, Philippe Verdy wrote:
The wbr is enough for this purpose,
No, since the purpose was clearly to specify a line break point that is
preferred over other possible line break points, or even the only
allowed line break point within a string.
The wbr tag (an old nonstandard
2014-02-05 18:22, Markus Scherer wrote:
On Tue, Feb 4, 2014 at 2:25 PM, Rhavin Grobert rha...@shadowtec.de
mailto:rha...@shadowtec.de wrote:
Parallel to soft hyphen, a hyphen that is just inserted if the word
was broken, it would be practical to have some way to tell browser:
if
2014-02-05 23:44, Rhavin Grobert wrote:
Wbr gives the opportunity to break at long|awesome. But what i mean is:
- non existing sbr in parralell to shy assumed -
Just giving a hypothetical character or tag an identifier does not
specify its intended meaning.
Do you think me gentle,sbr/do
2014-02-04 19:05, James Lin wrote:
For Arabic, percentage sign is fixed on the left side of the digit: %10
There seem to be different opinions and practices on this. In the CLDR
database, the formats have “%” (the Ascii percent sign) on the right of
the number, as far as I can see; Arabic
2013-12-06 0:45, Shriramana Sharma wrote:
In Unicode the characters with precomposed diacritics are given
canonical equivalences to the corresponding sequences of base
characters followed by separate diacritics. So Unicode-compliant
parsing tools should not distinguish between the two.
There
2013-11-04 21:00, Jennifer Wong wrote:
The use case is that customers want to integrate data from our
enterprise solution to their ASCII-based downstream systems.
This is very different from the question about removing accents while
conforming to language standards. The very goal makes it
2013-11-01 17:37, Jennifer Wong wrote:
I would like to ask for advice on removing accents from characters.
To address first the question you ask in the Subject line, “How to
remove accents while conforming to language standards?”, but do not ask
in the message body, the answer is: You
2013-10-29 6:12, d...@bisharat.net wrote:
If one refers to plain ASCII, or plain ASCII text or ...
characters, should this be taken strictly as referring to the 7-bit
basic characters, or might it encompass characters that might appear
in an 8-bit character set (per the so-called extended
2013-10-26 18:36, Sindre Sorhus wrote:
There are:
⬅ LEFTWARDS BLACK ARROW (U+2B05)
⬆ UPWARDS BLACK ARROW (U+2B06)
⬇ DOWNWARDS BLACK ARROW (U+2B07)
But no right arrow. Why is that?
There is
⬅ BLACK RIGHTWARDS ARROW (U+27A1)
The code chart at
http://www.unicode.org/charts/PDF/U2B00.pdf
has a
2013-10-22 21:38, Jean-François Colson wrote:
I know that in some Japanese encodings (JIS, EUC), \ was replaced by a ¥.
Some encodings indeed have “¥” U+00A5 YEN SIGN assigned to code point
0x5C, to which Unicode assigns “\” U+005C REVERSE SOLIDUS. This is
external to Unicode as such,
2013-10-20 2:38, Richard Wordingham wrote:
Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain
text?
Well, is h1helloh1 plain text? The answer is that any string of
characters may be considered as plain text and any string of characters
may be treated as rich text according
2013-10-20 11:47, Jukka K. Korpela wrote:
What you could do in a web page is to put U+00A0 U+25CC in one element
and U+0E31 in another and position the elements in the same place, set
to have the same width and to be horizontally centered.
Oops. I meant U+25CC and U+00A0 U+0E31.
But I’m
2013-10-03 7:46, Martin J. Dürst wrote:
On 2013/10/02 9:52, Leo Broukhis wrote:
Thanks! That comes out exactly right, although using math markup for
linguistic purposes is, IMO, a stretch.
Why? Surely like in other fields (Math to start with), there somewhere
is a boundary between plain text
2013-09-13 22:02, Whistler, Ken wrote:
The *interesting* question, in my opinion, is why folks feel impelled to use
U+2026 to render a baseline ellipsis in Latin typography at all, rather than
just using U+002E ad libitum...
In traditional typography, an ellipsis usually has dots set apart
Under Subject: Re: Why blackletter letters?
2013-09-12 20:20, Stephan Stiller wrote:
Talking about which ...
I confess I usually type a Danish Ø for convenience when I'm using
this, though for publication I would tend to substitute the proper ∅.
Whenever I saw the empty set symbol in printed
2013-09-10 20:36, Jukka K. Korpela wrote:
2013-09-10 20:01, Asmus Freytag wrote:
This rationale is absent in document WG2 N3907 that requests these
characters.
If this is document
http://std.dkuug.dk/jtc1/SC2/wg2/docs/n3907.pdf
then I’m rather confused: it proposes AB51 for LATIN SMALL
2013-09-10 20:01, Asmus Freytag wrote:
This rationale is absent in document WG2 N3907 that requests these
characters.
If this is document
http://std.dkuug.dk/jtc1/SC2/wg2/docs/n3907.pdf
then I’m rather confused: it proposes AB51 for LATIN SMALL LETTER
BLACKLETTER O and does not include LATIN
2013-08-05 23:46, Richard Wordingham wrote:
The requirement is that conformant processes not think they are doing
the right thing by treating canonically equivalent strings
differently. If there is latitude in a process, e.g. rendering, I
can't find a requirement to treat canonically
2013-08-06 9:38, Christopher Fynn wrote:
I wonder why so many servers, database applications, and so on, _still_
don't install with Unicode (in some encoding format) as the *default*
installation option.
There are probably several reasons, but one obvious reason is this: if
the default
2013-07-30 23:50, James Lin wrote:
If you open the Windows character Map, Segoe UI doesn't contain the
snowman while font Meiryo has.
I wrote about Segoe UI Symbol, not Segoe UI.
Meiryo, which is also shipped with Windows 7, indeed contains SNOWMAN.
This makes it even more odd if SNOWMAN is
2013-07-29 23:42, James Lin wrote:
I have a question regarding the supported Unicode code page.
There are no Unicode code pages.
I thought
once you have unicode code page loaded, all glyph or character should be
able to map and display correctly regardless of which OS or language you
are
2013-07-30 4:03, Buck Golemon wrote:
Also, some browsers have odd support for rendering unicode (non-ascii)
urls, for security reasons.
Both chrome and firefox under Windows 7 render http://www.☃.net/
http://www.xn--n3h.net/ as http://www.xn--n3h.net/ which is the ascii
domain encoding (called
2013-07-05 17:01, Dreiheller, Albrecht wrote:
A topic that is different but related to the current discussion writing
in an alphabet with fewer letters: letter replacements
is the question about writing units with limited character sets.
This is not a somehow academical question but a real
2013-06-14 22:30, Stephan Stiller wrote:
On 6/14/2013 11:45 AM, Roozbeh Pournader wrote:
They are unified with the double angle quotation marks. Persian also
uses the round version (and if if I remember correctly, Greek too).
Where can one find such information?
It’s somewhat implicit, but
2013-06-15 21:24, Michael Fayez wrote:
And yes as Dough Ewell said characters U+2E28 and U+2E29 can be used in
new data. They have the correct shape and properties though with the
wrong size unfortunately.
Well, U+2E28 has General Category Ps (Punctuation, Open), not Pi
(Punctuation, Initial
2013-05-10 13:54, Kiran Kumar Chava wrote:
From one of the books we are trying to Unicodify ... we have below line
।।ఓం పయోఽంబువచ్చేత్ తత్రాపి ఓం।। 3
There must not be that dotted circle before the sunna(zero like Telugu
symbol)
Am I missing something? Or is this a bug?
As
2013-03-10 4:57, Asmus Freytag wrote:
'The Lancet' reportedly insists on the use of the raised decimal point
[…
That's sensible advice, in a way, because B7 is in 8859-1 and therefore
supported in a huge variety of fonts, for practical purposes, the
coverage among non-decorative text fonts is
2013-03-09 21:30, Asmus Freytag wrote:
I believe the Unicode Standard should be fixed by explicitly removing
all suggestions in the text that the raised decimal point is unified
with 002E.
That would be a good move if agreement can be found on the recommended
coding of the middle dot.
2013-02-22 19:46, Leif Halvard Silli wrote:
Questions: Shouldn’t HYPHEN BULLET be on in the NamesList of
HYPHEN-MINUS? And shouldn‘t HYPHEN BULLET have HYPHEN-MINUS in its
NamesList?
The comments at the start of NamesList.txt say that it is
“semi-automatically derived from UnicodeData.txt”,
2013-02-18 17:36, Shriramana Sharma wrote:
On Mon, Feb 18, 2013 at 7:13 PM, Erkki I Kolehmainen e...@iki.fi wrote:
It may also be the result of a negotiating process within a special purpose
user group.
I also see no problem with the current definition. Since the whole
point of the standard
2013-02-16 11:38, Stephan Stiller wrote:
(By the way, for those finding the German rule to write SS
unsatisfactory: It's hard to come by an actual minimal pair.
Example: Strauss vs. Strauß. Originally the same name, but two spellings
make them two names that may need to be distinguished from
2013-02-13 21:31, Andries Brouwer wrote:
I wondered how to code an s-j overstrike combination in Unicode.
Attached a photograph of some text containing this combination.
It looks like something that has not been encoded. The same applies to
what seems to be an eth (ð) with a stroke, and
2013-02-07 12:21, Raymond Mercier wrote:
This problem is not precisely about Unicode - or is it?
Directionality of characters is a Unicode issue.
If I have a Hebrew text displayed in Adobe Acrobat I can select part of
it and can paste it into Word. The trouble is that while individual
2013-01-25 2:41, Richard Wordingham wrote:
On Thu, 24 Jan 2013 20:05:41 -0300
Andrés Sanhueza peroyomasli...@gmail.com wrote:
Do you think that a end of story symbol may be feasible/useful?
One such symbol is already encoded, the Halmos tombstone U+220E END OF
PROOF.
It is one of the many
2013-01-23 3:55, h...@tbbs.net wrote:
There is a bullet that often is uzed in local advertizing.
It separates phrases as em-dash. In the plane writs where it is uzed,
it is also equivalent to line-break:
DRAIN-CLEANING (O) GENERAL PLUMBING
or
LAWNMOWER REPAiR (O)
2013-01-16 1:18, James Lin wrote:
I have 2 fundamental questions.
I’ll address the first one only.
HTML5 supports isolation tag bdi,
The HTML5 drafts have it, but browser support is still limited. As
described at
2013-01-16 3:03, Phillips, Addison wrote:
Code points 2066, 2067, and 2068 are unassigned. I presume you mean
U+202B RIGHT-TO-LEFT EMBEDDING (RLE) and U+202C POP DIRECTIONAL
FORMATTING.
As Roozbeh pointed out, he means the characters added that provide bidi
isolation.
I see. The code
2013-01-11 0:28, Elbrecht wrote:
any help with an unknown character - very appreciated:
elbrecht.com/SW.png [400KB]
You probably tried to attach an image, but it was not sent or it was
stripped off by the mailing list software. Please upload the image in
some
2013-01-11 1:04, Elbrecht wrote:
the URL is:
www.elbrecht.com/SW.png
Well, the *URL* is
http://www.elbrecht.com/SW.png
or
http://elbrecht.com/SW.png
(I really thought it was just a local filename when I saw your first email.)
problem
2013-01-09 2:55, Leif Halvard Silli wrote:
The benefit of doing such a comparison is that we then get to
count both the HTML page *plus* all the extra fonts that is included in
the romanized Singhala file. Thus, we get a more *real* basis for
comparing the relative size of the two pages.
Not
2013-01-09 11:57, Leif Halvard Silli wrote:
Not sure which fallacy you have identified - see below.
I was referring to comparison between an ad hoc 8-but encoding and a
Unicode encoding so that you count the sizes font files in first case
only. I’m a bit confused with your comparison, which
2013-01-08 23:56, Naena Guru wrote:
May I ask if the following two are Latin script, English or Singhala?
1. This is written in English.
2. mee laþingaþa síhalayi.
For me, both are Latin script and 1 is English and 2 is Singhala (says,'
this is romanized Singhala').
Text 2 is
2013-01-02 8:35, Asmus Freytag wrote:
On 1/1/2013 3:53 PM, Naena Guru wrote:
(By the way, Unicode is quietly suppressing Basic Latin block by
removing it from the Latin group at top of the code block page
(http://www.unicode.org/charts/) and hiding it under different names
in the lower part of
2013-01-03 0:22, Markus Scherer wrote:
On Wed, Jan 2, 2013 at 1:25 PM, Jukka K. Korpela jkorp...@cs.tut.fi
mailto:jkorp...@cs.tut.fi wrote:
Then again, Latin is no different from Cyrillic, Greek, or Arabic,
for example, in this respect. In an apparent attempt to save space
2012-12-30 23:22, Costello, Roger L. wrote:
I have heard it stated that, in the context of character encoding and decoding:
Interoperability is getting better.
Where? It seems that this is what *you* are saying.
Do you have data to back up the assertion that interoperability is
2012-12-23 18:09, Karl Williamson wrote:
As another poster said, this quotation
would be considered fair use under USA law.
It was not a quotation but an excerpt posted without permission.
Quotations are allowed when they are needed to back up your statements
or specify what you are
2012-12-22 23:56, Costello, Roger L. wrote:
I figure the people on this list can truly appreciate this:
I don’t. You are posting an excerpt from a copyrighted book as such, not
as a legal quotation for an acceptable purpose. Moreover, you have
distorted the text. For example:
Homo
2012-12-21 21:05, Leif Halvard Silli wrote:
My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian
Dictionary from 2003 agree that both list words on Ё and Е under the
same category – namely, under the letter Е.
This appears to be the case in any serious dictionary.
The use of
2012-12-20 12:52, Martinho Fernandes wrote:
I was wondering if there is a list of character names translated into
other languages somewhere. Is there?
The standard ISO 10646, which is equivalent to Unicode as regards to
character names, is published in French, too. According to
2012-12-20 16:41, Andreas Prilop wrote:
On Thu, 20 Dec 2012, Jukka K. Korpela wrote:
http://www.ling.helsinki.fi/filt/info/mes2/
Unicode names have certain restrictions (capital ASCII letters, etc).
This Finnish list even uses non-ASCII characters but sticks to
capital letters. Why no small
2012-12-20 14:13, David Starner wrote:
It may be useful to try to agree on official or semi-official names for
characters in a language. Such a list hardly needs to cover all of the over
100,000 Unicode characters.
Why not? Why should an English speaker sticking a arbitrary character
into a
2012-12-20 17:59, Asmus Freytag wrote:
Character names serve two purposes, which are sometimes at odds. One is
to simply act as formal identifiers that are more or less mnemonic
(which the hex codes are not). The other is an aid in identifying a
character, as an aid in look-up or selection.
2012-12-21 2:45, Asmus Freytag wrote:
But when real people, not biologists, want to look up information they
have precisely two choices: they can look at a visual index (for species
that can be arranged visually) or they can look up the scientific name
for the species based on the only thing
2012-11-24 8:12, Masatoshi Kimura wrote:
According to TUS v6.2 clause 16.4,
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf#page=15
The base character in a variation sequence is never a
combining character or a decomposable character.
However, the following base characters appearing in
2012-11-17 0:20, Michael Everson wrote:
On 16 Nov 2012, at 22:12, Buck Golemon b...@yelp.com wrote:
That's my personal understanding as well, but can you help me find
documentation that I can show to my skeptical workmates?
It is the basis for most popular 8-bit character sets, including
2012-10-16 13:06, Christopher Fynn wrote:
On Windows I use Andrew West's Babel Pad
http://www.babelstone.co.uk/Software/BabelPad.html
As far as I can see, the “Encoding” menu in “Save As” in BabelPad has
just a small set of encodings to choose from, basically just UTF-8 and
UTF-16 and
2012-10-09 20:32, Bill Poser wrote:
No, I was contrasting the behaviour of s followed by U+0332, for which
there is no precomposed letter, with U+1E95, which is the precomposed
equivalent of z followed by U+0332.
You meant to write “followed by U+0331” at the end. But in any case,
this is a
2012-10-08 20:47, Andreas Prilop wrote:
I found some DejaVu bug reports where a developer called
Ben Laenen suggests the nonzero advance width is intentional
I wonder why. Other combining marks in DejaVu Sans Mono
do not have such a problem; see
2012-10-08 21:49, Andreas Prilop wrote:
On Mon, 8 Oct 2012, Jukka K. Korpela wrote:
http://www.user.uni-hannover.de/nhtcapri/combining-marks.html
Your test page is interesting, but is postulates the use
of style sheet switching,
You are always free to define your preferred font family
2012-10-07 8:38, Bill Poser wrote:
I have a web page that writes into an HTML5 textarea via the javascript
dom interface. U+0332 COMBINING LOW LINE is incorrectly rendered as a
spacing low line in both Mozilla Firefox and Google Chrome
The issue is not limited to textareas but appears in
2012-10-07 11:51, Michael Everson wrote:
The issue is not limited to textareas but appears in normal text
too, when the font is set to Courier New. You can also see the problem
in Microsoft Word, for example, when using that font. The point is that
this is a font problem, and you can see it
2012-09-07 21:16, Richard Wordingham wrote:
Some reasons for romanizing:
snip
3. Make the language accessible to those who are not familiar with the
script
The rest of the post is irrelevant. Transliterations from Semitic
languages have been established for this reason, and possibly
2012-09-06 23:47, Mark Davis ☕ wrote:
The distinction between transliteration and transcription is limited
to a few people.
Maybe, but I see that distinction clearly made in Finnish national
standards, for example, and it is a useful one.
It is far better to use unambiguous terms, like
2012-09-07 0:59, Mark Davis ☕ wrote:
They might be distinct in Finnish, but in English only in specialized
contexts,
This is not about everyday language (which is irrelevant in this
context) but about the language used in national standards.
Best to use terms that will be understood by
2012-09-07 1:54, Mark Davis ☕ wrote:
This might come off as a bit snarky, but do you /really/ think the
author and every one of the commentators on the thread all really meant
the following?
Compiling a list of Semitic transliteration characters/, but
restricted to only those
2012-08-17 1:44, Ian Clifton wrote:
Andreas Prilop prilop4...@trashmail.net writes:
On Thu, 16 Aug 2012, Ian Clifton wrote:
Having just been to Norway, and wanting to email my friends all
about it, I came across a curiosity: neither of the combining
characters U+0337, U+0338 seem to work in
2012-08-16 18:31, Ian Clifton wrote:
Having just been to Norway, and wanting to email my friends all about
it, I came across a curiosity: neither of the combining characters
U+0337, U+0338 seem to work in usually‐reliable Emacs, and indeed
U+00F8 LATIN SMALL LETTER O WITH STROKE doesn’t seem to
2012-08-16 20:53, Cristian Secară wrote:
În data de Thu, 16 Aug 2012 19:32:15 +0300, Erkki I Kolehmainen a scris:
Although the stroke is not a diacritic, keyboard drivers can be made
to generate atomic characters with stroke by using a dead letter key
for stroke together with the base
2012-08-14 22:56, Robert Wheelock wrote:
The _tonos_ (overtick) is a STRAIGHT 90º accent mark, whereas the
_oxeia_ (acute) is usually slanted at 45º.
It is somewhat tragicomic that you make the mistake of using masculine
ordinal indicator U+00BA in place of the degree sign U+00B0, when
2012-07-26 13:04, Andre Schappo kirjoitti:
Not emoticon but …….
I received an email from Email Insider. Email was written as E✉ail
✉ being U+2079
I thought it quite clever
U+2079 is SUPERSCRIPT NINE “⁹”. I suppose you meant U+2709 ENVELOPE “✉”,
an old (Unicode 1.0.0) dingbat (which now
2012-07-26 0:19, Steven Atreju wrote:
|
And that was an Unicode BOM that has been converted to UTF-8 and
then been converted to UTF-8 once again.
Apparently the problem is that the data has been doubly encoded: first
into UTF-8, then interpreting the bytes of UTF-8 data, interpreting
2012-07-20 19:52, Philippe Verdy wrote:
The Subject fi[el]d is subject to special encoding like
Quoted-Printable or Base64 using specific prefixes.
This is a matter of character encoding. All plain text inevitably has
some encoding, and the encoding may vary without changing the plain text
1 - 100 of 171 matches
Mail list logo