Re: A sign/abbreviation for "magister"

2018-10-29 Thread Ken Whistler via Unicode



On 10/29/2018 8:06 PM, James Kass via Unicode wrote:
could be typed on old-style mechanical typewriters.  Quintessential 
plain-text, that.


Nope. Typewriters were regularly used for underscoring and for 
strikethrough, both of which are *styling* of text, and not plain text. 
The mere fact that some visual aspect of graphic representation on a 
page of paper can be implemented via a mechanical typewriter does not, 
ipso facto, mean that particular feature is plain text. The fact that I 
could also implement superscripting and subscripting on a mechanical 
typewriter via turning the platen up and down half a line, also does not 
make *those* aspects of text styling plain text. either.


The same reasoning applies to handwriting, only more so.

--Ken



Re: A sign/abbreviation for "magister"

2018-10-29 Thread James Kass via Unicode



Asmus Freytag wrote,

> Nevertheless, I think the use of devices like combining underlines
> and superscript letters in plain text are best avoided.

That's probably true according to the spirit of the underlying encoding 
principles.  But hasn't that genie already left the bottle?


People write their names as they please.  With the entire repertoire of 
Unicode from which to choose, people are coming up with some amazingly 
unorthodox ways to "spell" their screen names.  Here's six screen names 
copy/pasted from an atypical Twitter account's comments sections:


Joё дмёгicди‏

I⃟MAGI⃟NER‏

Եօʍ‏

IXOYE444 (←This one included character U+200F, I removed it.)

Qęy ✪ eT ✪ Dog ► VOTES‏

  ❄️ Kἶოἶղმz  ❄️  ‏ (←Note the decorative emoji.)

People are mixing scripts and so forth in order to create distinctive 
screen names.  Those screen names are out there in the wild and are part 
of our stored data which future historians are welcome to scratch their 
heads over.


IIRC, around the time that the math alphanumerics were added to Plane 
One, Michael Everson noted that once characters are encoded people will 
use them as they see fit.  In this present thread, Michael Everson wrote:


> And I would not encode it as Mr͇, firstly because it
> would never render properly and you might as well
> encode it as Mr. or M:r, and second because in the
> IPA at least that character indicates an alveolar
> realization in disordered speech. (Of course it
> could be used for anything.)

Yes, it could be used for anything requiring combining-two-lines-below.  
At some point, if enough people were doing it, it would morph from a 
kludge of hacking alveolar whatevers into an accepted convention.  (Not 
that I am pushing this approach, it's only one suggestion out of many 
possibilities.  I'm in favor of direct encoding.)  I would not encode 
the abbreviation as either "Mr." or "M:r" because neither of those text 
strings appear in the original manuscript.


FAICT, "Եօʍ‏" is pronounced just like "Tom", but it ain't spelled the 
same.  Likewise for "McCoy" and "M=ͨCoy".


It strikes me as perverse if "Եօʍ‏" can spell his name as he pleases 
using the UCS but "M=ͨCoy" mustn't.  Especially since names like 
"M=ͨCoy" and abbreviations such as "M=ͬ" could be typed on old-style 
mechanical typewriters.  Quintessential plain-text, that.




Re: A sign/abbreviation for "magister"

2018-10-29 Thread Philippe Verdy via Unicode
For the case of "Mister" vs. "Magister", the (double) underlining is not
just a stylistic option but conveys semantics as an explicit abbreviation
mark !
We are here at the line between what is pure visual encoding (e.g. using
superscript letters), and logical encoding (as done eveywhere else in
unicode with combining sequences; the most well known exceptions being for
Thai script which uses the visual model).
Obviously the Latin script should not use any kind of visual encoding, and
even the superscript letters (initially introduced for something else,
notably as distinct symbols for IPA) was not the correct path (it also has
limitation because the superscript letters are quite limited; the same can
be saif about the visual encoding of Mathematic symbols as stylistic
variants transformed as plain characters, which will always be incomplete,
while it could as well be represented logically).
So Unicode does not have a consistent policy (and this inconsistence was
not just introduced due to legacy roundtrip compatibibility, like the
Numero abbreviation or the encoding of the Thai script).


Le lun. 29 oct. 2018 à 12:44, Asmus Freytag via Unicode 
a écrit :

> On 10/28/2018 11:50 PM, Martin J. Dürst via Unicode wrote:
>
> On 2018/10/29 05:42, Michael Everson via Unicode wrote:
>
> This is no different the Irish name McCoy which can be written MᶜCoy where 
> the raising of the c is actually just decorative, though perhaps it was once 
> an abbreviation for Mac. In some styles you can see a line or a dot under the 
> raised c. This is purely decorative.
>
> I would encode this as Mʳ if you wanted to make sure your data contained the 
> abbreviation mark. It would not make sense to encode it as M=ͬ or anything 
> else like that, because the “r” is not modifying a dot or a squiggle or an 
> equals sign. The dot or squiggle or equals sign has no meaning at all. And I 
> would not encode it as Mr͇, firstly because it would never render properly 
> and you might as well encode it as Mr. or M:r, and second because in the IPA 
> at least that character indicates an alveolar realization in disordered 
> speech. (Of course it could be used for anything.)
>
>
> I think this may depend on actual writing practice. In German at least,
> it is customary to have dots (periods) at the end of abbreviations, and
> using any other symbol, or not using the dot, would be considered an error.
>
> The question of how to encode that dot is fortunately an easy one, but
> even if it were not, German-writing people would find a sentence such as
> "The dot or ... has no meaning at all." extremely weird. The dot is
> there (and in German, has to be there) because it's an abbreviation.
>
> Swedes employ ":" for abbreviations but often (always?) for eliding
> several word-interior letters. Definitely also a case of a non-optional
> convention.
>
> The use of superscript is tricky, because it can be optional in some
> contexts; if I write "3rd" in English, it will definitely be understood no
> different from "3rd". Likewise with the several marks below superscripts.
> Whether "numero" has an underline or not appears to be a matter of font
> design, with some regional preferences (which also affect the style of the
> N).
>
> I'm very much with James that questions of what is spelling vs. what is
> style (decoration) can be a matter of opinion - or better perhaps, a matter
> of convention and associated expectations. And that there may not always be
> unanimity in the outcome.
>
> In TeX the two transition fluidly. If I was going to transcribe such texts
> in TeX, I would construct a macro for the construct of the entire
> abbreviation and would name it. That macro would raise the "r", and then -
> depending on the desired fidelity of the style of the document, might
> include secondary elements, such as underlining, or a squiggle.
>
> In the standard rich text model of plaintext "back bone" combined with
> font selection (and other styling), the named macro would correspond to
> encoding the semantic of an Mr abbreviation in the "superscript r"
> convention and the details would be handled in the font design.
>
> That system is perhaps not well suited to exact transcriptions because
> unlike Tex, it separates the two aspects, and removes the aspect of
> detailed glyph design from the control of the author, unless the latter is
> also a font-designer.
>
> Nevertheless, I think the use of devices like combining underlines and
> superscript letters in plain text are best avoided.
>
> A./
>
>
>


Re: A sign/abbreviation for "magister"

2018-10-29 Thread Marcel Schneider via Unicode
On 29/10/18 20:29, Doug Ewell via Unicode wrote:
[…]
> ObMagister: I agree that trying to reflect every decorative nuance of
> handwriting is not what plain text is all about.

Agreed.

> (I also disagree with
> those who insist that superscripted abbreviations are required for
> correct spelling in certain languages, and I expect to draw swift
> flamage for that stance.)

It all (no “flamage”, just trying to understand) depends on how we 
set the level of requirements, and what is understood by “correct”.
There is even an official position arguing that representing an "œ" 
with an "oe" string is correct, and that using the correct "œ" is 
not required. 

> The abbreviation in the postcard, rendered in
> plain text, is "Mr". Bringing U+02B3 or U+036C into the discussion

In English, “Mr” for “Mister” is correct, because English does not use 
superscript here, according to my knowledge. Ordinal indicators are 
considered different, and require superscript in correct representation.
Thus being trained on English, one cannot easily evaluate what is 
correct and what is required for correctness in a neighbor locale.

> just
> fuels the recurring demands for every Latin letter (and eventually those
> in other scripts) to be duplicated in subscript and superscript, à la
> L2/18-206.

That is a generic request, unrelated to any locale, based only on a kind 
of criticism of poor rendering systems. The “fake super-/subscripts” are 
already fixed if only OpenType is supported and fonts are complete.

> 
> Back into my hole now.

No worries. Stay tuned :-) Informed discussion brings advancement.

Best regards,

Marcel



Re: A sign/abbreviation for "magister"

2018-10-29 Thread Doug Ewell via Unicode
Richard Wordingham wrote:
 
>> I like palaeographic renderings of text very much indeed, and in fact
>> remain in conflict with members of the UTC (who still, alas, do NOT
>> communicate directly about such matters, but only in duelling ballot
>> comments) about some actually salient representations required for
>> medievalist use. The squiggle in your sample, Janusz, does not
>> indicate anything; it is only a decoration, and the abbreviation is
>> the same without it.
>
> I think this is one of the few cases where Multicode may have
> advantages over Unicode. In a mathematical contest, aⁿ would be
> interpreted as _a_ applied _n_ times. As to "fⁿ", ambiguity may be
> avoided by the superscript being inappropriate for an exponent. What
> is redundant in one context may be significant in another.
 
Are you referring to the encoding described in the 1997 paper by
Mudawwar, which "address[es] Unicode's principal drawbacks" by switching
between language-specific character sets? Kind of like ISO 2022, but
less extensible?
 
ObMagister: I agree that trying to reflect every decorative nuance of
handwriting is not what plain text is all about. (I also disagree with
those who insist that superscripted abbreviations are required for
correct spelling in certain languages, and I expect to draw swift
flamage for that stance.) The abbreviation in the postcard, rendered in
plain text, is "Mr". Bringing U+02B3 or U+036C into the discussion just
fuels the recurring demands for every Latin letter (and eventually those
in other scripts) to be duplicated in subscript and superscript, à la
L2/18-206.

Back into my hole now.

--
Doug Ewell | Thornton, CO, US | ewellic.org




Re: A sign/abbreviation for "magister"

2018-10-29 Thread Richard Wordingham via Unicode
On Sun, 28 Oct 2018 20:42:04 +
Michael Everson via Unicode  wrote:

> I like palaeographic renderings of text very much indeed, and in fact
> remain in conflict with members of the UTC (who still, alas, do NOT
> communicate directly about such matters, but only in duelling ballot
> comments) about some actually salient representations required for
> medievalist use. The squiggle in your sample, Janusz, does not
> indicate anything; it is only a decoration, and the abbreviation is
> the same without it.

I think this is one of the few cases where Multicode may have
advantages over Unicode.  In a mathematical contest, aⁿ would be
interpreted as _a_ applied _n_ times.  As to "fⁿ", ambiguity may be
avoided by the superscript being inappropriate for an exponent.  What
is redundant in one context may be significant in another.

Richard. 



Re: A sign/abbreviation for "magister"

2018-10-29 Thread Asmus Freytag via Unicode

  
  
On 10/28/2018 11:50 PM, Martin J. Dürst
  via Unicode wrote:


  On 2018/10/29 05:42, Michael Everson via Unicode wrote:

  
This is no different the Irish name McCoy which can be written MᶜCoy where the raising of the c is actually just decorative, though perhaps it was once an abbreviation for Mac. In some styles you can see a line or a dot under the raised c. This is purely decorative.

I would encode this as Mʳ if you wanted to make sure your data contained the abbreviation mark. It would not make sense to encode it as M=ͬ or anything else like that, because the “r” is not modifying a dot or a squiggle or an equals sign. The dot or squiggle or equals sign has no meaning at all. And I would not encode it as Mr͇, firstly because it would never render properly and you might as well encode it as Mr. or M:r, and second because in the IPA at least that character indicates an alveolar realization in disordered speech. (Of course it could be used for anything.)

  
  
I think this may depend on actual writing practice. In German at least, 
it is customary to have dots (periods) at the end of abbreviations, and 
using any other symbol, or not using the dot, would be considered an error.

The question of how to encode that dot is fortunately an easy one, but 
even if it were not, German-writing people would find a sentence such as 
"The dot or ... has no meaning at all." extremely weird. The dot is 
there (and in German, has to be there) because it's an abbreviation.

Swedes employ ":" for abbreviations but often (always?) for
  eliding several word-interior letters. Definitely also a case of a
  non-optional convention.
The use of superscript is tricky, because it can be optional in
  some contexts; if I write "3rd" in English, it will definitely be
  understood no different from "3rd". Likewise with the
  several marks below superscripts. Whether "numero" has an
  underline or not appears to be a matter of font design, with some
  regional preferences (which also affect the style of the N).
I'm very much with James that questions of what is spelling vs.
  what is style (decoration) can be a matter of opinion - or better
  perhaps, a matter of convention and associated expectations. And
  that there may not always be unanimity in the outcome.
In TeX the two transition fluidly. If I was going to transcribe
  such texts in TeX, I would construct a macro for the construct of
  the entire abbreviation and would name it. That macro would raise
  the "r", and then - depending on the desired fidelity of the style
  of the document, might include secondary elements, such as
  underlining, or a squiggle.
In the standard rich text model of plaintext "back bone" combined
  with font selection (and other styling), the named macro would
  correspond to encoding the semantic of an Mr abbreviation in the
  "superscript r" convention and the details would be handled in the
  font design.
That system is perhaps not well suited to exact transcriptions
  because unlike Tex, it separates the two aspects, and removes the
  aspect of detailed glyph design from the control of the author,
  unless the latter is also a font-designer.
Nevertheless, I think the use of devices like combining
  underlines and superscript letters in plain text are best avoided.
A./



  



Re: A sign/abbreviation for "magister"

2018-10-29 Thread Janusz S. Bień via Unicode
On Mon, Oct 29 2018 at  7:57 GMT, James Kass wrote:
> Janusz S. Bień asked,
>
>> Do you claim that in the ground-truth for HWR the
>> squiggle and raising doesn't matter?
>
> Not me!

I know, sorry if my previous mail was confusing.

> "McCoy", "M=ͨCoy", and "M-ͨCoy" are three different ways of
> writing the same surname.  If I were entering plain text data from an
> old post card, I'd try to keep the data as close to the source as
> possible.  Because that would be my purpose.  Others might have
> different purposes.  As you state, it depends on the intention. But,
> if there were an existing plain text convention I'd be inclined to use
> it.  Conventions allow for the possibility of interchange, direct
> encoding would ensure it.

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien



Re: A sign/abbreviation for "magister"

2018-10-29 Thread James Kass via Unicode



Janusz S. Bień asked,

> Do you claim that in the ground-truth for HWR the
> squiggle and raising doesn't matter?

Not me!  "McCoy", "M=ͨCoy", and "M-ͨCoy" are three different ways of 
writing the same surname.  If I were entering plain text data from an 
old post card, I'd try to keep the data as close to the source as 
possible.  Because that would be my purpose.  Others might have 
different purposes.  As you state, it depends on the intention. But, if 
there were an existing plain text convention I'd be inclined to use it.  
Conventions allow for the possibility of interchange, direct encoding 
would ensure it.




Re: A sign/abbreviation for "magister"

2018-10-29 Thread Martin J . Dürst via Unicode
On 2018/10/29 05:42, Michael Everson via Unicode wrote:
> This is no different the Irish name McCoy which can be written MᶜCoy where 
> the raising of the c is actually just decorative, though perhaps it was once 
> an abbreviation for Mac. In some styles you can see a line or a dot under the 
> raised c. This is purely decorative.
> 
> I would encode this as Mʳ if you wanted to make sure your data contained the 
> abbreviation mark. It would not make sense to encode it as M=ͬ or anything 
> else like that, because the “r” is not modifying a dot or a squiggle or an 
> equals sign. The dot or squiggle or equals sign has no meaning at all. And I 
> would not encode it as Mr͇, firstly because it would never render properly 
> and you might as well encode it as Mr. or M:r, and second because in the IPA 
> at least that character indicates an alveolar realization in disordered 
> speech. (Of course it could be used for anything.)

I think this may depend on actual writing practice. In German at least, 
it is customary to have dots (periods) at the end of abbreviations, and 
using any other symbol, or not using the dot, would be considered an error.

The question of how to encode that dot is fortunately an easy one, but 
even if it were not, German-writing people would find a sentence such as 
"The dot or ... has no meaning at all." extremely weird. The dot is 
there (and in German, has to be there) because it's an abbreviation.

Regards,   Martin.