I think it would be useful to actually write down an overview of the
recommended
implementation approach for handling ALL the different uses for middle dot
and to
make sure that what is recommended is not only theoretically possible, but
acceptable and accepted(!) as best practice by
The question is who would be able to take on the drafting of a document
that explains the recommended usage of 00B7 for the various purposes
(including recommended ways of getting the correct rendering and
processing).
ONLY by having such a document, is it possible to be certain that the
2013/3/27 Asmus Freytag asm...@ix.netcom.com:
At the moment, the statement that the existing encoding is actually
implementable is something that must be considered unproven (enough issues
have been pointed out for various elements of the unification already to
allow such a conclusion).
What
On 3/27/2013 12:07 PM, Philippe Verdy wrote:
2013/3/27 Asmus Freytag asm...@ix.netcom.com:
At the moment, the statement that the existing encoding is actually
implementable is something that must be considered unproven (enough issues
have been pointed out for various elements of the unification
On Fri, 22 Mar 2013 18:49:24 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
On 3/22/2013 6:17 PM, Richard Wordingham wrote:
On Fri, 22 Mar 2013 18:01:14 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
On 03/21/2013 04:48 PM, Richard Wordingham wrote:
However, distinguishing U+00B7 and
On 23 Mar 2013, at 01:01, Asmus Freytag asm...@ix.netcom.com wrote:
Let's get back to the interesting question:
Is it possible to correctly process text that uses 00B7 for ANO TELEIA, or is
this fundamentally impossible? If so, under what scenario?
It is possible to process text without
2013/3/23 Michael Everson ever...@evertype.com:
It is possible to process text without Unicode at all, using sets and sets
of 8-bit font-hack fonts. We all did it for years.
What a deceptive solution ! Without abandoning Unicode, it would be
much simpler to use PUA characters, and.custom fonts
Am Freitag, 22. März 2013 um 17:29 schrieb Richard Wordingham:
RW Is there evidence of conscious
RW distinction of U+02BC MODIFIER LETTER APOSTROPHE and U+2019 RIGHT SINGLE
RW QUOTATION MARK ...
See: http://commons.wikimedia.org/wiki/File:Okina-using-Linux-Libertine.svg
-- Karl P.
On 3/23/2013 4:55 AM, Michael Everson wrote:
On 23 Mar 2013, at 01:01, Asmus Freytag asm...@ix.netcom.com wrote:
Let's get back to the interesting question:
Is it possible to correctly process text that uses 00B7 for ANO TELEIA, or is
this fundamentally impossible? If so, under what
On 3/21/2013 4:22 PM, Philippe Verdy wrote:
2013/3/21 Richard Wordingham richard.wording...@ntlworld.com:
Further, the code chart glyphs for the ANO TELEIA and the MIDDLE DOT
differ, see attachment. If they are canonically equivalent, and one
is a mandatory decomposition of the other, why do
2013/3/22 Asmus Freytag asm...@ix.netcom.com:
Semantic selectors are pure pseudo-coding, because if the semantic
differentiation is needed it is needed in plain text - and then it should be
expressible in plain character codes.
We don't disagree, that's exactly what I meant here : plain
2013/3/22 Asmus Freytag asm...@ix.netcom.com:
If you need to annotate text with the results of semantic analysis as
performed by a human reader, then you either need XML, or some other format
that can express that particular intent.
Absolutely NO. If this encodes semantics, this is part of
2013/3/22 Asmus Freytag asm...@ix.netcom.com:
The number of conventions that can be applicable to certain punctuation
characters is truly staggering, and it seems unlikely that Unicode is the
right place to
a) discover all of them or
b) standardize an expression for them.
My intent is
On 3/22/2013 4:02 AM, Philippe Verdy wrote:
2013/3/22 Asmus Freytag asm...@ix.netcom.com:
Semantic selectors are pure pseudo-coding, because if the semantic
differentiation is needed it is needed in plain text - and then it should be
expressible in plain character codes.
We don't disagree,
On 3/22/2013 4:08 AM, Philippe Verdy wrote:
2013/3/22 Asmus Freytag asm...@ix.netcom.com:
If you need to annotate text with the results of semantic analysis as
performed by a human reader, then you either need XML, or some other format
that can express that particular intent.
Absolutely NO. If
On 3/22/2013 4:16 AM, Philippe Verdy wrote:
2013/3/22 Asmus Freytag asm...@ix.netcom.com:
The number of conventions that can be applicable to certain punctuation
characters is truly staggering, and it seems unlikely that Unicode is the
right place to
a) discover all of them or
b) standardize an
On Fri, 22 Mar 2013 12:08:14 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
adding new variants of existing characters like what was done
specifically for maths is not a stabl long term solution; solutions
similar to variant selectors however are much more meaningful, and
will allow for
On 03/21/2013 04:48 PM, Richard Wordingham wrote:
For linguistic analysis, you need the normalisation appropriate to the
task. This is a case where Unicode normalisation generally throws away
information (namely, how the author views the characters), whereas in
analysing Burmese you may want to
On 3/22/2013 12:08 PM, Karl Williamson wrote:
On 03/21/2013 04:48 PM, Richard Wordingham wrote:
For linguistic analysis, you need the normalisation appropriate to the
task.
Linguistic analysis (in general) being a hugely complex undertaking,
mere normalization pales in comparison, so
On Fri, 22 Mar 2013 13:08:01 -0600
Karl Williamson pub...@khwilliamson.com wrote:
This is the first time I've heard someone suggest that one can
tailor normalizations.
I think the officially acceptable term is 'folding'. One would
not be 'tailoring a Unicode normalisation', but subverting the
On Fri, 22 Mar 2013 18:01:14 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
On 03/21/2013 04:48 PM, Richard Wordingham wrote:
However, distinguishing U+00B7 and U+0387 would fail spectacularly
of the text had been converted to form NFC before you received it.
That's a claim for which
On 3/22/2013 6:17 PM, Richard Wordingham wrote:
On Fri, 22 Mar 2013 18:01:14 -0700
Asmus Freytag asm...@ix.netcom.com wrote:
On 03/21/2013 04:48 PM, Richard Wordingham wrote:
However, distinguishing U+00B7 and U+0387 would fail spectacularly
of the text had been converted to form NFC before
With my apologies, read too low in the first sentence.
jk
On 3/21/13 9:20 AM, Kalvesmaki, Joel kalvesma...@doaks.org wrote:
FWIW, I worked with an expert in Greek paleography last year who insisted
that the ano teleia (U+0387) in an array of various Unicode-compliant OTFs
I showed him were
On Wed, 20 Mar 2013 20:49:32 -0600
Karl Williamson pub...@khwilliamson.com wrote:
Now back to processing general text. Doing any serious analysis of
text will require using regular expressions. That means normalizing
the input, as UTS 18 finally now says.
I think that change may be
2013/3/21 Richard Wordingham richard.wording...@ntlworld.com:
Further, the code chart glyphs for the ANO TELEIA and the MIDDLE DOT
differ, see attachment. If they are canonically equivalent, and one
is a mandatory decomposition of the other, why do they have differing
glyphs?
Because the
On 03/09/2013 07:52 PM, Richard Wordingham wrote:
On Sat, 09 Mar 2013 16:21:17 -0700
Karl Williamson pub...@khwilliamson.com wrote:
Sorry, for the delayed reply; I've been under deadline
Rendering is not the only consideration. Processing textual content
for 0387 is broken because it is
On Mon, 11 Mar 2013 05:27:35 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/3/10 Richard Wordingham richard.wording...@ntlworld.com:
If we unify U+00B7's three possible roles of (a) digraph breaker,
(b) ano teleia and (c) decimal point, we could have the following
scheme:
(1)
2013/3/11 Richard Wordingham richard.wording...@ntlworld.com:
On Mon, 11 Mar 2013 05:27:35 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/3/10 Richard Wordingham richard.wording...@ntlworld.com:
If we unify U+00B7's three possible roles of (a) digraph breaker,
(b) ano teleia and (c)
On Sat, 9 Mar 2013 18:58:45 -0700
Doug Ewell d...@ewellic.org wrote:
Richard Wordingham wrote:
The general feeling seems to be that computers don't do proper
decimal points, and so the raised decimal point is dropping out of
use.
Any discussion of whether computers handle decimal points
Should the Unicode Consortium decide to recommend an existing (or new)
character as a raised decimal for numbers, we would add that to CLDR, and
recommend that implementations accept either one as equivalent when parsing.
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è
On 2013-03-10, Richard Wordingham richard.wording...@ntlworld.com wrote:
The question is what users will demand. Expectations have been low
enough that the loss of decimal points has been accepted.
Additionally, striving for an apparently hard to get raised decimal
point risks being forced to
Oh, now I understand your comment. Matrix multiplication has no dot (and
uses juxtaposition); the inner (scalar) product uses · , and the cross
product uses × .
I was thaught to use × for matrix multiplication (Computer Science, Hungary).
Á
However, for fully correct math layout, to require math mode (i.e.
global markup selecting math layout) is an appropriate restriction and
some minor infidelities in pure plain text rendering of math are
therefore tolerable.
I don't think the mere existence of a raised dot used as a decimal
2013-03-10 4:57, Asmus Freytag wrote:
'The Lancet' reportedly insists on the use of the raised decimal point
[…
That's sensible advice, in a way, because B7 is in 8859-1 and therefore
supported in a huge variety of fonts, for practical purposes, the
coverage among non-decorative text fonts is
2013/3/10 Richard Wordingham richard.wording...@ntlworld.com:
On Sun, 10 Mar 2013 17:22:05 +0200
Jukka K. Korpela jkorp...@cs.tut.fi wrote:
2013-03-10 4:57, Asmus Freytag wrote:
'The Lancet' reportedly insists on the use of the raised decimal
point
[…
That's sensible advice, in a way,
Are there any widely available fonts that in non-specialist tools will
render the decimal point U+002E FULL STOP significantly above the
baseline when it is used as a decimal point? I count word processors
as non-specialist, and am not interested in special word processor
commands to make
2013/3/9 Richard Wordingham richard.wording...@ntlworld.com:
In a real example of such a font, how would one adjust the position so
that U+002E is on the baseline in section numbers but raised in genuine
decimal numbers? (This is not an idiosyncratic style.)
In fact I would have even thought
On Sat, 9 Mar 2013 18:23:27 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/3/9 Richard Wordingham richard.wording...@ntlworld.com:
In a real example of such a font, how would one adjust the position
so that U+002E is on the baseline in section numbers but raised in
genuine decimal
Richard,
the situation with the raised decimal point is a mess in Unicode.
I know that Mark thinks we have too many dots, but the reason this case
is a mess is because the unification with U+002E is both non-workable in
practice and runs counter to precedent.
The precedent in Unicode is to
RW == Richard Wordingham richard.wording...@ntlworld.com writes:
RW Are there any widely available fonts that in non-specialist tools
RW will render the decimal point U+002E FULL STOP significantly above
RW the baseline when it is used as a decimal point?
I'm not aware of any, and I haven't
Brits use a
baseline dot for multiplication and a middle dot for a decimal point,
given that we Yanks do the exact opposite (at least in handwriting)
I find × more common in the US. But it probably depends on the school
context or discipline/field.
Stephan
2013-03-09 21:30, Asmus Freytag wrote:
I believe the Unicode Standard should be fixed by explicitly removing
all suggestions in the text that the raised decimal point is unified
with 002E.
That would be a good move if agreement can be found on the recommended
coding of the middle dot.
SS == Stephan Stiller stephan.stil...@gmail.com writes:
SS I find × more common in the US. But it probably depends on the school
SS context or discipline/field.
It is common, but I only experienced it in handwriting in k-6, (maybe k-8).
In high school and at uni everyone used · or just
On Sat, 09 Mar 2013 15:09:53 -0500
James Cloos cl...@jhcloos.com wrote:
Apropos having two renderings for U+002E:
... note that if nothing
else is suitable, one always could use a StylisticSet to do the
substitution.
That would be a better way of handling it than a false language
setting.
SS I find × more common in the US. But it probably depends on the school
SS context or discipline/field.
It is common, but I only experienced it in handwriting in k-6, (maybe k-8).
In high school and at uni everyone used · or just juxtaposition.
Okay, at US-American college, all printed
On 3/9/2013 1:51 PM, Jukka K. Korpela wrote:
2013-03-09 21:30, Asmus Freytag wrote:
I believe the Unicode Standard should be fixed by explicitly removing
all suggestions in the text that the raised decimal point is unified
with 002E.
That would be a good move if agreement can be found on the
On Sat, 09 Mar 2013 14:16:27 -0800
Stephan Stiller stephan.stil...@gmail.com wrote:
A very weird notation I encountered in the US (and this must be
predominantly K-12 notation, though it survives into a few
college-level text of the solution manual type) is to write
multiplication as 3(4)(5)
2013/3/9 Asmus Freytag asm...@ix.netcom.com:
This appears to be another possible mistake. However, the Greek script does
provide a context which could be used to select the ano teleia appearance
and properties (unless you tell me that the character appears in Greek
surrounded by non-Greek
On Sat, Mar 09, 2013 at 12:19:31PM +, Richard Wordingham wrote:
Are there any widely available fonts that in non-specialist tools will
render the decimal point U+002E FULL STOP significantly above the
baseline when it is used as a decimal point?
British typographic instructions for the
On 3/9/2013 3:41 PM, Philippe Verdy wrote:
2013/3/9 Asmus Freytag asm...@ix.netcom.com:
This appears to be another possible mistake. However, the Greek script does
provide a context which could be used to select the ano teleia appearance
and properties (unless you tell me that the character
On Sat, 09 Mar 2013 14:41:11 -0800
Asmus Freytag asm...@ix.netcom.com wrote:
On 3/9/2013 1:51 PM, Jukka K. Korpela wrote:
2013-03-09 21:30, Asmus Freytag wrote:
I wonder what character and techniques British publishers use to
produce notations with a raised dot. Is it 002E, with
2013/3/10 Asmus Freytag asm...@ix.netcom.com:
On 3/9/2013 3:41 PM, Philippe Verdy wrote:
2013/3/9 Asmus Freytag asm...@ix.netcom.com:
This appears to be another possible mistake. However, the Greek script
does
provide a context which could be used to select the ano teleia
appearance
and
Richard Wordingham wrote:
The general feeling seems to be that computers don't do proper decimal
points, and so the raised decimal point is dropping out of use.
Any discussion of whether computers handle decimal points properly
can't happen without talking about number-to-string conversion
'The Lancet' reportedly insists on the use of the raised decimal point
(http://www.download.thelancet.com/flatcontentassets/authors/artwork-guidelines.pdf)
and gives the instructions 'Type decimal points midline (ie, 23·4, not
23.4). To create a midline decimal on a PC: hold down ALT key and
when teaching arithmetic × was typical, but when teaching elementary
algebra or higher math · was used
I would agree with this. Essentially, it seems like real math – to
the extent that it uses numbers beyond {0, 1, 2} in the first place
:-) – uses ·
Oh, now I understand your comment.
On Sat, 09 Mar 2013 16:21:17 -0700
Karl Williamson pub...@khwilliamson.com wrote:
Rendering is not the only consideration. Processing textual content
for 0387 is broken because it is considered to be an ID_Continue
character, whereas its Greek usage is equivalent to the English
semicolon,
On 3/9/2013 5:30 PM, Richard Wordingham wrote:
On Sat, 09 Mar 2013 14:41:11 -0800
Asmus Freytag asm...@ix.netcom.com wrote:
On 3/9/2013 1:51 PM, Jukka K. Korpela wrote:
2013-03-09 21:30, Asmus Freytag wrote:
I wonder what character and techniques British publishers use to
produce notations
Richard has given some cogent arguments below.
Another counter example is the use of : to form abbreviations in
Swedish. (It's inserted in the word to replace the elided part). In that
use, this punctuation character is suddenly part of a word.
To handle the full set of general case, word
On 3/9/2013 6:01 PM, Stephan Stiller wrote:
'The Lancet' reportedly insists on the use of the raised decimal point
(http://www.download.thelancet.com/flatcontentassets/authors/artwork-guidelines.pdf)
and gives the instructions 'Type decimal points midline (ie, 23·4, not
23.4). To create a
On 3/9/2013 5:47 PM, Philippe Verdy wrote:
2013/3/10 Asmus Freytag asm...@ix.netcom.com:
On 3/9/2013 3:41 PM, Philippe Verdy wrote:
2013/3/9 Asmus Freytag asm...@ix.netcom.com:
This appears to be another possible mistake. However, the Greek script
does
provide a context which could be used to
The [...] exceptions just prove the rule
That's a semantically empty statement :-) And an exception can only be
an exception /to a particular rule/.
Mathematical layout has all sorts of little idiosyncratic rules about
spacing etc. that are subtly different from regular text, even though
61 matches
Mail list logo