Re: A sign/abbreviation for "magister"
On 10/29/2018 8:06 PM, James Kass via Unicode wrote: could be typed on old-style mechanical typewriters. Quintessential plain-text, that. Nope. Typewriters were regularly used for underscoring and for strikethrough, both of which are *styling* of text, and not plain text. The mere fact that some visual aspect of graphic representation on a page of paper can be implemented via a mechanical typewriter does not, ipso facto, mean that particular feature is plain text. The fact that I could also implement superscripting and subscripting on a mechanical typewriter via turning the platen up and down half a line, also does not make *those* aspects of text styling plain text. either. The same reasoning applies to handwriting, only more so. --Ken
Re: A sign/abbreviation for "magister"
Asmus Freytag wrote, > Nevertheless, I think the use of devices like combining underlines > and superscript letters in plain text are best avoided. That's probably true according to the spirit of the underlying encoding principles. But hasn't that genie already left the bottle? People write their names as they please. With the entire repertoire of Unicode from which to choose, people are coming up with some amazingly unorthodox ways to "spell" their screen names. Here's six screen names copy/pasted from an atypical Twitter account's comments sections: Joё дмёгicди I⃟MAGI⃟NER Եօʍ IXOYE444 (←This one included character U+200F, I removed it.) Qęy ✪ eT ✪ Dog ► VOTES ❄️ Kἶოἶղმz ❄️ (←Note the decorative emoji.) People are mixing scripts and so forth in order to create distinctive screen names. Those screen names are out there in the wild and are part of our stored data which future historians are welcome to scratch their heads over. IIRC, around the time that the math alphanumerics were added to Plane One, Michael Everson noted that once characters are encoded people will use them as they see fit. In this present thread, Michael Everson wrote: > And I would not encode it as Mr͇, firstly because it > would never render properly and you might as well > encode it as Mr. or M:r, and second because in the > IPA at least that character indicates an alveolar > realization in disordered speech. (Of course it > could be used for anything.) Yes, it could be used for anything requiring combining-two-lines-below. At some point, if enough people were doing it, it would morph from a kludge of hacking alveolar whatevers into an accepted convention. (Not that I am pushing this approach, it's only one suggestion out of many possibilities. I'm in favor of direct encoding.) I would not encode the abbreviation as either "Mr." or "M:r" because neither of those text strings appear in the original manuscript. FAICT, "Եօʍ" is pronounced just like "Tom", but it ain't spelled the same. Likewise for "McCoy" and "M=ͨCoy". It strikes me as perverse if "Եօʍ" can spell his name as he pleases using the UCS but "M=ͨCoy" mustn't. Especially since names like "M=ͨCoy" and abbreviations such as "M=ͬ" could be typed on old-style mechanical typewriters. Quintessential plain-text, that.
Re: A sign/abbreviation for "magister"
For the case of "Mister" vs. "Magister", the (double) underlining is not just a stylistic option but conveys semantics as an explicit abbreviation mark ! We are here at the line between what is pure visual encoding (e.g. using superscript letters), and logical encoding (as done eveywhere else in unicode with combining sequences; the most well known exceptions being for Thai script which uses the visual model). Obviously the Latin script should not use any kind of visual encoding, and even the superscript letters (initially introduced for something else, notably as distinct symbols for IPA) was not the correct path (it also has limitation because the superscript letters are quite limited; the same can be saif about the visual encoding of Mathematic symbols as stylistic variants transformed as plain characters, which will always be incomplete, while it could as well be represented logically). So Unicode does not have a consistent policy (and this inconsistence was not just introduced due to legacy roundtrip compatibibility, like the Numero abbreviation or the encoding of the Thai script). Le lun. 29 oct. 2018 à 12:44, Asmus Freytag via Unicode a écrit : > On 10/28/2018 11:50 PM, Martin J. Dürst via Unicode wrote: > > On 2018/10/29 05:42, Michael Everson via Unicode wrote: > > This is no different the Irish name McCoy which can be written MᶜCoy where > the raising of the c is actually just decorative, though perhaps it was once > an abbreviation for Mac. In some styles you can see a line or a dot under the > raised c. This is purely decorative. > > I would encode this as Mʳ if you wanted to make sure your data contained the > abbreviation mark. It would not make sense to encode it as M=ͬ or anything > else like that, because the “r” is not modifying a dot or a squiggle or an > equals sign. The dot or squiggle or equals sign has no meaning at all. And I > would not encode it as Mr͇, firstly because it would never render properly > and you might as well encode it as Mr. or M:r, and second because in the IPA > at least that character indicates an alveolar realization in disordered > speech. (Of course it could be used for anything.) > > > I think this may depend on actual writing practice. In German at least, > it is customary to have dots (periods) at the end of abbreviations, and > using any other symbol, or not using the dot, would be considered an error. > > The question of how to encode that dot is fortunately an easy one, but > even if it were not, German-writing people would find a sentence such as > "The dot or ... has no meaning at all." extremely weird. The dot is > there (and in German, has to be there) because it's an abbreviation. > > Swedes employ ":" for abbreviations but often (always?) for eliding > several word-interior letters. Definitely also a case of a non-optional > convention. > > The use of superscript is tricky, because it can be optional in some > contexts; if I write "3rd" in English, it will definitely be understood no > different from "3rd". Likewise with the several marks below superscripts. > Whether "numero" has an underline or not appears to be a matter of font > design, with some regional preferences (which also affect the style of the > N). > > I'm very much with James that questions of what is spelling vs. what is > style (decoration) can be a matter of opinion - or better perhaps, a matter > of convention and associated expectations. And that there may not always be > unanimity in the outcome. > > In TeX the two transition fluidly. If I was going to transcribe such texts > in TeX, I would construct a macro for the construct of the entire > abbreviation and would name it. That macro would raise the "r", and then - > depending on the desired fidelity of the style of the document, might > include secondary elements, such as underlining, or a squiggle. > > In the standard rich text model of plaintext "back bone" combined with > font selection (and other styling), the named macro would correspond to > encoding the semantic of an Mr abbreviation in the "superscript r" > convention and the details would be handled in the font design. > > That system is perhaps not well suited to exact transcriptions because > unlike Tex, it separates the two aspects, and removes the aspect of > detailed glyph design from the control of the author, unless the latter is > also a font-designer. > > Nevertheless, I think the use of devices like combining underlines and > superscript letters in plain text are best avoided. > > A./ > > >
Re: A sign/abbreviation for "magister"
On 29/10/18 20:29, Doug Ewell via Unicode wrote: […] > ObMagister: I agree that trying to reflect every decorative nuance of > handwriting is not what plain text is all about. Agreed. > (I also disagree with > those who insist that superscripted abbreviations are required for > correct spelling in certain languages, and I expect to draw swift > flamage for that stance.) It all (no “flamage”, just trying to understand) depends on how we set the level of requirements, and what is understood by “correct”. There is even an official position arguing that representing an "œ" with an "oe" string is correct, and that using the correct "œ" is not required. > The abbreviation in the postcard, rendered in > plain text, is "Mr". Bringing U+02B3 or U+036C into the discussion In English, “Mr” for “Mister” is correct, because English does not use superscript here, according to my knowledge. Ordinal indicators are considered different, and require superscript in correct representation. Thus being trained on English, one cannot easily evaluate what is correct and what is required for correctness in a neighbor locale. > just > fuels the recurring demands for every Latin letter (and eventually those > in other scripts) to be duplicated in subscript and superscript, à la > L2/18-206. That is a generic request, unrelated to any locale, based only on a kind of criticism of poor rendering systems. The “fake super-/subscripts” are already fixed if only OpenType is supported and fonts are complete. > > Back into my hole now. No worries. Stay tuned :-) Informed discussion brings advancement. Best regards, Marcel
Re: A sign/abbreviation for "magister"
Richard Wordingham wrote: >> I like palaeographic renderings of text very much indeed, and in fact >> remain in conflict with members of the UTC (who still, alas, do NOT >> communicate directly about such matters, but only in duelling ballot >> comments) about some actually salient representations required for >> medievalist use. The squiggle in your sample, Janusz, does not >> indicate anything; it is only a decoration, and the abbreviation is >> the same without it. > > I think this is one of the few cases where Multicode may have > advantages over Unicode. In a mathematical contest, aⁿ would be > interpreted as _a_ applied _n_ times. As to "fⁿ", ambiguity may be > avoided by the superscript being inappropriate for an exponent. What > is redundant in one context may be significant in another. Are you referring to the encoding described in the 1997 paper by Mudawwar, which "address[es] Unicode's principal drawbacks" by switching between language-specific character sets? Kind of like ISO 2022, but less extensible? ObMagister: I agree that trying to reflect every decorative nuance of handwriting is not what plain text is all about. (I also disagree with those who insist that superscripted abbreviations are required for correct spelling in certain languages, and I expect to draw swift flamage for that stance.) The abbreviation in the postcard, rendered in plain text, is "Mr". Bringing U+02B3 or U+036C into the discussion just fuels the recurring demands for every Latin letter (and eventually those in other scripts) to be duplicated in subscript and superscript, à la L2/18-206. Back into my hole now. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: A sign/abbreviation for "magister"
On Sun, 28 Oct 2018 20:42:04 + Michael Everson via Unicode wrote: > I like palaeographic renderings of text very much indeed, and in fact > remain in conflict with members of the UTC (who still, alas, do NOT > communicate directly about such matters, but only in duelling ballot > comments) about some actually salient representations required for > medievalist use. The squiggle in your sample, Janusz, does not > indicate anything; it is only a decoration, and the abbreviation is > the same without it. I think this is one of the few cases where Multicode may have advantages over Unicode. In a mathematical contest, aⁿ would be interpreted as _a_ applied _n_ times. As to "fⁿ", ambiguity may be avoided by the superscript being inappropriate for an exponent. What is redundant in one context may be significant in another. Richard.
Re: A sign/abbreviation for "magister"
On 10/28/2018 11:50 PM, Martin J. Dürst via Unicode wrote: On 2018/10/29 05:42, Michael Everson via Unicode wrote: This is no different the Irish name McCoy which can be written MᶜCoy where the raising of the c is actually just decorative, though perhaps it was once an abbreviation for Mac. In some styles you can see a line or a dot under the raised c. This is purely decorative. I would encode this as Mʳ if you wanted to make sure your data contained the abbreviation mark. It would not make sense to encode it as M=ͬ or anything else like that, because the “r” is not modifying a dot or a squiggle or an equals sign. The dot or squiggle or equals sign has no meaning at all. And I would not encode it as Mr͇, firstly because it would never render properly and you might as well encode it as Mr. or M:r, and second because in the IPA at least that character indicates an alveolar realization in disordered speech. (Of course it could be used for anything.) I think this may depend on actual writing practice. In German at least, it is customary to have dots (periods) at the end of abbreviations, and using any other symbol, or not using the dot, would be considered an error. The question of how to encode that dot is fortunately an easy one, but even if it were not, German-writing people would find a sentence such as "The dot or ... has no meaning at all." extremely weird. The dot is there (and in German, has to be there) because it's an abbreviation. Swedes employ ":" for abbreviations but often (always?) for eliding several word-interior letters. Definitely also a case of a non-optional convention. The use of superscript is tricky, because it can be optional in some contexts; if I write "3rd" in English, it will definitely be understood no different from "3rd". Likewise with the several marks below superscripts. Whether "numero" has an underline or not appears to be a matter of font design, with some regional preferences (which also affect the style of the N). I'm very much with James that questions of what is spelling vs. what is style (decoration) can be a matter of opinion - or better perhaps, a matter of convention and associated expectations. And that there may not always be unanimity in the outcome. In TeX the two transition fluidly. If I was going to transcribe such texts in TeX, I would construct a macro for the construct of the entire abbreviation and would name it. That macro would raise the "r", and then - depending on the desired fidelity of the style of the document, might include secondary elements, such as underlining, or a squiggle. In the standard rich text model of plaintext "back bone" combined with font selection (and other styling), the named macro would correspond to encoding the semantic of an Mr abbreviation in the "superscript r" convention and the details would be handled in the font design. That system is perhaps not well suited to exact transcriptions because unlike Tex, it separates the two aspects, and removes the aspect of detailed glyph design from the control of the author, unless the latter is also a font-designer. Nevertheless, I think the use of devices like combining underlines and superscript letters in plain text are best avoided. A./
Re: A sign/abbreviation for "magister"
On Mon, Oct 29 2018 at 7:57 GMT, James Kass wrote: > Janusz S. Bień asked, > >> Do you claim that in the ground-truth for HWR the >> squiggle and raising doesn't matter? > > Not me! I know, sorry if my previous mail was confusing. > "McCoy", "M=ͨCoy", and "M-ͨCoy" are three different ways of > writing the same surname. If I were entering plain text data from an > old post card, I'd try to keep the data as close to the source as > possible. Because that would be my purpose. Others might have > different purposes. As you state, it depends on the intention. But, > if there were an existing plain text convention I'd be inclined to use > it. Conventions allow for the possibility of interchange, direct > encoding would ensure it. Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien
Re: A sign/abbreviation for "magister"
Janusz S. Bień asked, > Do you claim that in the ground-truth for HWR the > squiggle and raising doesn't matter? Not me! "McCoy", "M=ͨCoy", and "M-ͨCoy" are three different ways of writing the same surname. If I were entering plain text data from an old post card, I'd try to keep the data as close to the source as possible. Because that would be my purpose. Others might have different purposes. As you state, it depends on the intention. But, if there were an existing plain text convention I'd be inclined to use it. Conventions allow for the possibility of interchange, direct encoding would ensure it.
Re: A sign/abbreviation for "magister"
On 2018/10/29 05:42, Michael Everson via Unicode wrote: > This is no different the Irish name McCoy which can be written MᶜCoy where > the raising of the c is actually just decorative, though perhaps it was once > an abbreviation for Mac. In some styles you can see a line or a dot under the > raised c. This is purely decorative. > > I would encode this as Mʳ if you wanted to make sure your data contained the > abbreviation mark. It would not make sense to encode it as M=ͬ or anything > else like that, because the “r” is not modifying a dot or a squiggle or an > equals sign. The dot or squiggle or equals sign has no meaning at all. And I > would not encode it as Mr͇, firstly because it would never render properly > and you might as well encode it as Mr. or M:r, and second because in the IPA > at least that character indicates an alveolar realization in disordered > speech. (Of course it could be used for anything.) I think this may depend on actual writing practice. In German at least, it is customary to have dots (periods) at the end of abbreviations, and using any other symbol, or not using the dot, would be considered an error. The question of how to encode that dot is fortunately an easy one, but even if it were not, German-writing people would find a sentence such as "The dot or ... has no meaning at all." extremely weird. The dot is there (and in German, has to be there) because it's an abbreviation. Regards, Martin.