On Mon, Apr 20, 2015 at 06:03:49PM +0000, John Colvin via Digitalmars-d wrote: > On Monday, 20 April 2015 at 17:48:17 UTC, Panke wrote: > >To measure the columns needed to print a string, you'll need the > >number of graphemes. (d|)?string.length gives you the number of code > >units. > > Even that's not really true. In the end it's up to the font and layout > engine to decide how much space anything takes up. Unicode doesn't > play nicely with the idea of text as a grid of rows and fixed-width > columns of characters, although quite a lot can (and is, see urxvt for > example) be shoe-horned in.
Yeah, even the grapheme count does not necessarily tell you how wide the printed string really is. The characters in the CJK block are usually rendered with fonts that are, on average, twice as wide as your typical Latin/Cyrillic character, so even applications like urxvt that shoehorn proportional-width fonts into a text grid render CJK characters as two columns rather than one. Because of this, I actually wrote a function at one time to determine the width of a given Unicode character (i.e., zero, single, or double) as displayed in urxvt. Obviously, this is no help if you need to wrap lines rendered with a proportional font. And it doesn't even attempt to work correctly with bidi text. This is why I said at the beginning that wrapping a line of text is a LOT harder than it sounds. A function that only takes a string as input does not have the necessary information to do this correctly in all use cases. The current wrap() function doesn't even do it correctly modulo the information available: it doesn't handle combining diacritics and zero-width characters properly. In fact, it doesn't even handle control characters properly, except perhaps for \t and \n. There are so many things wrong with the current wrap() function (and many other string-processing functions in Phobos) that it makes it look like a joke when we claim that D provides Unicode correctness out-of-the-box. The only use case where wrap() gives the correct result is when you stick with pre-Unicode Latin strings to be displayed on a text console. As such, I don't really see the general utility of wrap() as it currently stands, and I question its value in Phobos, as opposed to an actually more useful implementation that, for instance, correctly implements the Unicode line-breaking algorithm. T -- It said to install Windows 2000 or better, so I installed Linux instead.
