RE: Ways to show Unicode contents on Windows?

2013-07-09 Thread Murray Sargent
: Tuesday, July 9, 2013 8:37 PM To: Unicode Discussion Subject: Re: Ways to show Unicode contents on Windows? On Wed, Jul 10, 2013 at 04:24:36AM +, Murray Sargent wrote: Ilya asked, Are there any other ways to show Unicode on Windows? You can download Unibook (http://www.unicode.org/unibook

RE: Word reversal from Abobe to Word

2013-02-08 Thread Murray Sargent
Albrecht notes that The complete RTF clipboard content is this, created by Adobe Acrobat 9 Pro, Version 9.5.1: : 7B 5C 72 74 66 31 5C 61 6E 73 69 5C 61 6E 73 69 {\rtf1\ansi\ansi 0010: 63 70 67 31 32 35 32 5C 75 63 31 20 7B 5C 66 6F cpg1252\uc1 {\fo 0020: 6E 74 74 62 6C 5C 66 30 5C 66

RE: Word reversal from Abobe to Word

2013-02-07 Thread Murray Sargent
If you include a {\fonttbl...} entry that defines \f0 as an Arabic font, Word displays it correctly. For example, include {\fonttbl{\f0\fswiss\fcharset177 Arial;}} as in {\rtf1{\fonttbl{\f0\fswiss\fcharset177 Arial;}} \pard\plain\ql\f0\fs20 {\fs40 \u1511 \'F7\u1493 \'E5\u1491 \'E3\u1502

RE: Word reversal from Abobe to Word

2013-02-07 Thread Murray Sargent
Bing or google the clipboard format string. You'll get the answer in the first few hits. Murray Sent from my Windows Phone From: Stephan Stillermailto:stephan.stil...@gmail.com Sent: ‎2/‎7/‎2013 8:51 PM To: Dreiheller,

RE: Word reversal from Abobe to Word

2013-02-07 Thread Murray Sargent
by claiming the script has the reverse directionality. This enables Word to write RTF that represents an LRO...PDF embedding. Murray -Original Message- From: Asmus Freytag [mailto:asm...@ix.netcom.com] Sent: Thursday, February 7, 2013 9:28 PM To: Murray Sargent Cc: Dreiheller, Albrecht; Raymond

RE: cp1252 decoder implementation

2012-11-20 Thread Murray Sargent
Phillipe commented: (even if later Microsoft decides to map some other characters in its own windows-1252 charset, like it did several times and notably when the Euro symbol was mapped). Personal opinion, but I'd be very surprised if Microsoft ever changed the 1252 charset. The euro was added

RE: Missing geometric shapes

2012-11-08 Thread Murray Sargent
Mark E. Shoulson m...@kli.org wrote: Mirroring tends to be done for glyphs that are used in *pairs*, open/close things and such. Not invariably; consider the integral and summation. They don't have mirrored counterparts and many other mathematical symbols don't either. Murray

RE: User-Hostile Text Editing (was: Unicode String Models)

2012-07-21 Thread Murray Sargent
[mailto:unicode-bou...@unicode.org] On Behalf Of Richard Wordingham Sent: Saturday, July 21, 2012 4:52 PM To: Unicode Subject: User-Hostile Text Editing (was: Unicode String Models) On Fri, 20 Jul 2012 23:16:17 + Murray Sargent murr...@exchange.microsoft.com wrote: My latest blog post

RE: Unicode String Models

2012-07-20 Thread Murray Sargent
Mark wrote: “I put together some notes on different ways for programming languages to handle Unicode at a low level. Comments welcome.” Nice article as far as it goes and additions are forthcoming. In addition to multiple code units per character in UTF-8 and UTF-16, there are variation

RE: combining: half, double, triple et cetera ad infinitum

2011-11-14 Thread Murray Sargent
QSJN 4 UKR asks, Why did the Unicode Consortium think that combination of one base character and few combining is possible, and combination of few base characters with one combining character is not? E.g. U+0483 tilda has to cover a number. Whole number! For mathematical constructs in general,

RE: Solidus variations

2011-10-07 Thread Murray Sargent
One set of examples of the use of these solidus variations occurs in the mathematics linear format described in Unicode Technical Note #28 (http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf). The ASCII solidus (U+002F) described in Section 2.1 is used to represent normal stacked

RE: Solidus variations

2011-10-07 Thread Murray Sargent
In the linear format of UTN #28, 1/2/3/4 builds up as ((1/2)/3)/4 as in computer languages like C. The notation actually started with C semantics and then added a larger set of operators, and finally adopted the full Unicode set of mathematical operators. You can try it out in Microsoft Office

RE: RTL PUA?

2011-08-22 Thread Murray Sargent
It's actually quite easy to convince Uniscribe to treat specific characters as RTL, others as LTR, and, in general, with whatever classifications you desire. Pass a preprocessed string to Uniscribe's ScriptItemize(). RichEdit has used that approach to some degree starting with RichEdit 3.0

RE: Combining Triple Diacritics (N3915) not accepted by UTC #125

2010-11-10 Thread Murray Sargent
You can put diacritics over an arbitrarily large base by using an accent object in a math zone. For example, in my email editor (Outlook), I type alt+= to insert a math zone and then (a+b)\tildespacespace to get [cid:image001.png@01CB80BE.389DD340] (wide tilde over a+b). Evidently

RE: number padless?

2010-08-06 Thread Murray Sargent
In some Microsoft products, e.g., Word, WordPad, OneNote and Outlook, you can type ctrl+~ followed by n to get ñ. Or you can type F1 alt+x to get ñ. The alt+x conversion of hex Unicode values is easier than the alt+numpad approach, since the Unicode Standard is in hex. Murray From:

RE: number padless?

2010-08-06 Thread Murray Sargent
Type F1 alt+x, where F1 means the letter F key followed by the 1 key, not Function key 1. U+00F1 is the Unicode value of ñ. In general to type in a character by its Unicode value, type in the hex value and then alt+x. E.g., to type in math italic a, type 1D44E alt+x , which gives 푎. Murray

RE: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-28 Thread Murray Sargent
Alex notes Operands are not operators, e.g. in a+b, a and b are operands, + is an operator. I'm sure Karl Williamson knows that, but the mathematical alphanumerics also aren't operators and they nevertheless have the math property. We need to change the description of the math property to

RE: High dot/dot above punctuation?

2010-07-28 Thread Murray Sargent
Contextual rendering is getting to be more common thanks to adoption of OpenType features. For example, both MS Publisher 2010 and MS Word 2010 support various contextually dependent OpenType features at the user's discretion. The choice of glyph for U+002E could be chosen according to an

RE: Pashto yeh characters

2010-07-28 Thread Murray Sargent
Andreas Prilop commented A native speaker of English does not /automatically/ know better about English grammar, English punctuation than an informed Frenchman. So true, so true. Most native speakers of English have only limited understanding of English grammar. At least in my country. They

RE: High dot/dot above punctuation?

2010-07-28 Thread Murray Sargent
Asmus asks, Which implementation makes the required context analysis to determine whether 002E is part of a number during layout? If it does make this determination, which OpenType feature does it invoke? Which font supports this particular OpenType feature? I haven't looked to see if our

RE: High dot/dot above punctuation?

2010-07-28 Thread Murray Sargent
Michael asks, Are or will be OT features supported in, say, filenames? The answer depends on the renderer. For example, if you display filenames in NotePad using the Calibri font, default English ligatures are used automatically using OpenType table info. Murray

RE: High dot/dot above punctuation?

2010-07-28 Thread Murray Sargent
Michael asks, Are or will be OT features supported in, say, filenames? The answer depends on the renderer. For example, if you display filenames in NotePad using the Calibri font, default English ligatures are used automatically using OpenType table info. I meant on the desktop or in the

RE: Plain text (was: Re: High dot/dot above punctuation?)

2010-07-28 Thread Murray Sargent
Doug comments: Murray Sargent murrays at exchange dot microsoft dot com wrote: It's worth remembering that plain text is a format that was introduced due to the limitations of early computers. Books have always been rendered with at least some degree of rich text. And due

RE: Generic Base Letter

2010-06-29 Thread Murray Sargent
Vincent asks, So how does one go about getting buy-in? Are the interested parties on this mailing list, or do you have contact information for decision makers in the various voting organizations? I think you, Khaled, Michael and others have made a very good case for having some way to render

RE: Generic Base Letter

2010-06-28 Thread Murray Sargent
Khaled notes: There are so many issues with MS implementation(s), for example you can not combine any arbitrary Arabic diacritical marks on any given base character. I don't think Unicode need to invent workaround broken vendor implementations, interested parties should instead pressure on that

RE: Unicode math examples

2010-06-09 Thread Murray Sargent
Doug asks, Can anyone point me to some *real-world* examples of mathematics text encoded in Unicode, including (especially) the Mathematical Alphanumeric Symbols starting at U+1D400? Here are two documents with such text: Unicode Technical Report #25 Unicode Support for Mathematics

RE: Unicode Ruby

2004-12-19 Thread Murray Sargent
Couple of notes on Word's support. Word has been based on Unicode since Word '97, although it certainly didn't support all of Unicode at that time. Word has displayed ruby in built-up form for several versions now (the name for it is under Asian formatting and called phonetic guide). Murray

RE: Wide Characters in Windows and UTF16

2004-08-11 Thread Murray Sargent
Wide characters in Windows 2K and XP are used for UTF-16 for most programs that I know of including the Microsoft Office suite and OS programs such as NotePad and WordPad. Windows 9x has limited Unicode support, but many programs do use wide characters for UTF-16 on Windows 9x as well. Murray

RE: Surrogates in WordPad

2004-02-01 Thread Murray Sargent
Title: Surrogates in WordPad Type the UTF-32 code for the character instead of the surrogate pair. For example to get a math italic i, type 1D456 and then Alt+x. Lone surrogate codes aren't desirable. RichEdit does allow the high code to come in alone via the WM_CHAR message, since some

Does Java 1.5 support Unicode math alphanumerics as variable names?

2004-01-23 Thread Murray Sargent
Title: Does Java 1.5 support Unicode math alphanumerics as variable names? E.g., math italic i (U+1D456)? With such usage, Java mathematical programs could look more like the original math. Thanks Murray

RE: Code points on Windows

2004-01-14 Thread Murray Sargent
Mike Ayers asked: On Windows, it is well known that you can generate a character from its code point by holding down the alt key and typing the code point in decimal, with a leading 0, on the numeric keypad. I recall that there is also a method to do this in reverse - given a character on, say,

RE: Code points on Windows

2004-01-14 Thread Murray Sargent
Raymond Mercier wrote: In MS Word if you type the Unicode code point, followed by Alt-X, you get the character (if you have the font). This works in reverse. Sometimes in a RichEdit control window it will work in the first direction, but not in reverse. It does not work in Wordpad, in spite of

RE: character map in Microsoft Word

2003-12-11 Thread Murray Sargent
Title: RE: character map in Microsoft Word WordPad uses RichEdit 4.1 on Windows XP and both RichEdit 4.1 and 3.0 support the Alt+NumPad numbers greater than 255 as Unicode values. But other editors on XP, e.g., NotePad do not (sigh). The preferred way with RichEdit is to use the hex code

RE: How can I input any Unicode character if I know its hexadecimal code?

2003-11-14 Thread Murray Sargent
Patrick asks: «Q. How can I input any Unicode character if I know its hexadecimal code?» You could use an app that supports the Alt+x input method (like Word or WordPad) and then copy the result into an app that doesn't. For reference, the Alt+x input method works as follows: A handy

RE: Hexadecimal digits?

2003-11-10 Thread Murray Sargent
An important part of Ricardo Niemietz's hex digit proposal (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2677) is to have columns of hexadecimal numbers line up properly as columns of decimal numbers do. This could be achieved using a font with a set of glyph variants for A-F with a hexadecimal

RE: question about Windows-1252 and Unicode mapping

2003-02-27 Thread Murray Sargent
I think the Euro at 0x80 for 1252 (and several other 125x code pages) was added in May 1988. Cathy Wissink can confirm this. It certainly happened before 1999, since we added support for it in RichEdit 3.0 which shipped with Windows 2000 and Office 2000. Murray -Original Message- From:

RE: question about Windows-1252 and Unicode mapping

2003-02-27 Thread Murray Sargent
As KenW pointed out, I meant May 1998, not 1988! Thanks Murray -Original Message- From: Murray Sargent Sent: Thursday, February 27, 2003 3:44 PM To: 'Yung-Fong Tang' Cc: John Myers; Takayuki Tei; kat momoi; Naoki Hotta; Cathy Wissink; [EMAIL PROTECTED] Subject: RE: question about

RE: The result of the plane 14 tag characters review.

2002-11-13 Thread Murray Sargent
I think Doug asked for lightweight. HTML and XML markup aren't lightweight by any means, although a special purpose plain-text oriented XML (LTML for language-tagged markup language) might not be that much more involved than plane 14 tags. It would also have the advantage that standard XSLT tools

RE: Names for UTF-8 with and without BOM

2002-11-01 Thread Murray Sargent
Joseph Boyle says: It would be useful to have official names to distinguish UTF-8 with and without BOM. To see if a UTF-8 file has no BOM, you can just look at the first three bytes. Is this a problem? Typically when you care about a file's encoding form, you plan to read the file. Thanks Murray

RE: script or block detection needed for Unicode fonts

2002-09-29 Thread Murray Sargent
Title: Re: script or block detection needed for Unicode fonts John Jenkins wrote: "This just seems wildly inefficient to me, but then I'm coming from anOS where this isn't done. The app doesn't keep track of whether or nota particular font can draw a particular character; that's

RE: script or block detection needed for Unicode fonts

2002-09-28 Thread Murray Sargent
Michael Everson said: I don't understand why a particular bit has to be set in some table. Why can't the OS just accept what's in the font? The main reason is performance. If an application has to check the font cmap for every character in a file, it slows down reading the file. Accordingly

RE: glyph selection for Unicode in browsers

2002-09-26 Thread Murray Sargent
I don't think the idea is that codepage equals language. Rather codepage equals a writing system, which consists of one or more scripts (e.g., 6 scripts for ShiftJIS). As such the codepage is a useful cue in choosing an appropriate font for rendering text. In the RichEdit edit engine, we use a

RE: Furigana

2002-08-13 Thread Murray Sargent
As Ken says the Unicode interlinear annotation characters are for internal use only. Specifically, their meanings can be different for different programs. If you have your nice marked up text in memory and want to export it for use by some program, you need to use a higher-level protocol that

RE: Furigana

2002-08-13 Thread Murray Sargent
Michael Everson said Well then they [interlinear annotation characters] oughtn't to have been encoded. Michael, you aren't an implementer. When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting

RE: Furigana

2002-08-13 Thread Murray Sargent
6:11 PM To: Murray Sargent Cc: Michael Everson; [EMAIL PROTECTED] Subject: Re: Furigana Murray, It's true implementers need some place to attach higher level protocols, but they don't need specific points for specific implementations of internal protocols. If they weren't good enough to be used

RE: Typing Unicode via Alt+NumPad

2002-08-11 Thread Murray Sargent
Title: Typing Unicode via Alt+NumPad Actually any application using RichEdit 3.0 or later (e.g, WordPad and often Outlook) uses any value higher than 255 as a Unicode value. Values less than 255 are also Unicode, except for 0128 - 0159. Note that for values less than 255, you need to

RE: Inappropriate Proposals FAQ

2002-07-03 Thread Murray Sargent
Timothy Partridge included the restriction - No archaic styles of existing characters. E.g. dotless j. as something inappropriate. Question: how does one code up (presumably with markup) a caret over a jk pair in a math expression? The dot on the j should be missing for this case, but how does

RE: Can browsers show text? I don't think so!

2002-07-02 Thread Murray Sargent
Michael Jansson says: There are no technical reasons for why css/html4/xhtml can not produce every bit as high quality as any other page layout format. Sadly this is currently far from the case. HTML/CSS even including CSS3 is far from a professional document publishing format. It doesn't

RE: terminology

2002-05-02 Thread Murray Sargent
Sentinel is fairly commonly used in computer science and program code for data delimiters. Delimiter is also a good word for this (I use it in RichEdit code), but one may well use delimiter to describe a quote character (like U+0022), whereas I've never seen sentinel used for a quote. As such

RE: Concerning mathematics

2002-03-08 Thread Murray Sargent
Stefan Persson [mailto:[EMAIL PROTECTED]] asks how in the formula mfågel = 1 kg would the italic å be encoded? Mathematics has a set of standard letters for mathematical symbols. They can include diacritics, which can be expressed using the appropriate combining marks. In your formula

RE: How to make oo with combining breve/macron over pair?

2002-03-05 Thread Murray Sargent
MathML does have markup to extend diacritics across arbitrary numbers of characters and it's not likely that MathML would use the CGJ for this purpose But it would be handy for representing such expressions in plain-text Unicode Murray

RE: CRLF vs. LF (was Re: Unicode and end users)

2002-02-21 Thread Murray Sargent
I agree that NotePad ought to be able to display a pure LF file correctly. Word and WordPad do. However they do translate the LFs to CRLFs on saving, which limits their interoperability with Unix. It would be fairly easy to have an option to write LF files, if there's sufficient interest.

RE: Proposing Fraktur

2002-01-29 Thread Murray Sargent
David Starner said: Fraktur is not a different script from the Latin script, and therefore is not encoded separately. True, but Fraktur math characters are encoded in plane 1 for use in mathematics. These characters are not intended to be used for natural language purposes (unless you think

RE: The benefit of a symbol for 2 pi

2002-01-21 Thread Murray Sargent
Capital pi is to product as capital sigma is to summation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Sun 2002/01/20 02:19 To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: The benefit of a

RE: ISCII-Unicode Conversion

2001-11-06 Thread Murray Sargent
Marco Cimarosti writes: Tom Emerson wrote: One gotcha, that I run into every six months or so, is forgetting that the punctuation characters in the Basic Latin block are classified as Latin script. This trips me up because most of my text processing work involves CJK, so I'll write

RE: GB18030

2001-09-21 Thread Murray Sargent
I think I've figured out a way to find the beginning of a GB18030 character starting anywhere in a document. The algorithm is similar to finding the beginning of a DBCS character in that you scan backward until you find a byte that can only come at the start of a character. The main difference

RE: Unicode/font questions.

2001-08-01 Thread Murray Sargent
Actually fonts on Windows are normally Unicode based (including MS Mincho and MS Gothic) and most have in addition some codepage access. So there is neither a perf hit nor a codepage problem in using such fonts on NT, Win2000 and WinXP. These considerations are orthogonal to OpenType. Murray

RE: UTF-17

2001-06-22 Thread Murray Sargent
Hey guys, Ken is just kidding. He's evidently tired of the current plethora of ways to represent Unicode let alone all those new ones being proposed. Sigh, I am too. Carl, you understand the problem of adding yet another UTF: you too will probably have to support it. Murray Carl Brown

RE: converting ISO 8859-1 character set text to ASCII (128)charactet set

2001-06-20 Thread Murray Sargent
If you need to roundtrip 8859-1 through ASCII, you need to use some kind of escape mechanism inside the ASCII to represent characters that have their high bit equal to one. A common simple escape is to use the backslash. So you could represent the codes as \'xx, where xx is the hexadecimal code.

RE: More trivia: Misc. Math. Symbols-B and decomposition

2001-06-08 Thread Murray Sargent
It's intriguing to think of an encoding for math symbols that breaks them down into sequences of pieces. For example, NOT EQUAL could be EQUAL followed by a slash combining mark. Maybe some day a cleanicode will be developed that handles this and related characters in a consistent, uniform way.

RE: Math operators

2001-06-05 Thread Murray Sargent
Unicode has many multiplication signs, e.g., U+00D7, U+00B7, U+2022, U+2219, U+2299, U+22A0, U+22C6, etc. In this spirit, you can probably include U+2605 ($B!z(J) Murray -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 05, 2001 11:59 AM To:

RE: Script use by Mathematicians (Was Re: Single Unicode Font)

2001-05-22 Thread Murray Sargent
Has anyone ever made a character collection for mathematics? Please check out Unicode 3.1 and 3.2 (coming up). Characters from the STIX collaboration of a variety mathematical sources such as AmSTeX and MathML have been collected into a math character set that seems to have the vast

RE: Property error for U+2118?

2001-02-01 Thread Murray Sargent
The Weierstrass symbol U+2118 isn't a capital letter in spite of its name, nor is it really an alphabet character. It's sort of a stylized mixture of a rho and a lower-case script p. However in view of the principle that character names never change, even if incorrect, this symbol remains the

RE: Benefits of Unicode

2001-01-29 Thread Murray Sargent
In some of my talks at the Unicode conferences (see "Tips and Tricks..."), I have addressed problems with Unicode, notably trying to figure out whether to use a Chinese Simplified/Traditional, Japanese, or Korean font to render a Chinese character inserted in a plain-text scenario. This is a

RE: lag time in Unicode implementations in OS, etc?

2000-10-13 Thread Murray Sargent
It would be great if things were that easy. But users typically don't want to worry about fonts. They enter a character, maybe by pasting plain text, and want it magically to appear as something other than the "missing-character" glyph. They probably don't even know if it's a

RE: surrogate terminology

2000-09-12 Thread Murray Sargent
For what it's worth, I've been referring to characters between 0x1 and 0x10 as "higher-plane" characters as distinguished from BMP characters. Seems to work well in a general way. For plane 1, I use "plane=1" characters, etc. Murray

RE: Identifying a Unicode character

2000-08-18 Thread Murray Sargent
If you can get the text into a Win32 RichEdit control version 3.0 or later (Office 2000 and/or Windows 2000 in WordPad), type Shift+Alt+x after the character and the character will be replaced by its Unicode hexadecimal value. If you type Alt+x, that code gets converted back into the Unicode

RE: APL letters

2000-07-17 Thread Murray Sargent
One interesting possibility for representing the APL italic characters would be to use the math italic alphabet in plane 1. The motivation for their use in APL is similar to that for the math case: the characters are separate symbols, e.g., they don't get grouped into natural language words. In

RE: Plane 14 language tags

2000-06-29 Thread Murray Sargent
Subject: Re: Plane 14 language tags Brendan Murray wrote: Murray Sargent [EMAIL PROTECTED] wrote: Note that in C, it's essentially just as fast to make character comparisons with (ch | 0x20) as with ch alone, i.e., if you know ch is in an ASCII range (0 - 0x7F or 0xE

RE: Plane 14 language tags

2000-06-28 Thread Murray Sargent
Note that in C, it's essentially just as fast to make character comparisons with (ch | 0x20) as with ch alone, i.e., if you know ch is in an ASCII range (0 - 0x7F or 0xE - 0xE007F), you can do a case insensitive compare as quickly as a case sensitive one. The problem with assuming lower case

RE: Twinbridge Word 2000

2000-06-27 Thread Murray Sargent
[EMAIL PROTECTED] asked: The question is: Is there any way for making True type fonts and Unicode compatible? The answer to this question is: Microsoft's implementation of TrueType has always been based on Unicode, right from the first version in 1992. The answer to the original question,