On August 23, 2011, Asmus Freytag wrote: > On 8/23/2011 7:22 AM, Doug Ewell wrote: >> Of all applications, a word processor or DTP application would want >> to know more about the properties of characters than just whether >> they are RTL. Line breaking, word breaking, and case mapping come to >> mind. >> >> I would think the format used by standard UCD files, or the XML >> equivalent, would be preferable to making one up: > > The right answer would follow the XML format of the UCD. > > That's the only format that allows all necessary information contained > in one file, and it would leverage of any effort that users of the > main UCD have made in parsing the XML format. > > An XML format shold also be flexible in that you can add/remove not > just characters, but properties as needed. > > The worst thing do do, other than designing something from scratch, > would be to replicate the UnicodeData.txt layout with its random, but > fixed collection of properties and insanely many semi-colons. None of > the existing UCD txt files carries all the needed data in a single > file.
I don't know if or how I responded 7 years ago, but at least today, I think this is an excellent suggestion. If the goal is to encourage vendors to support PUA assignments, using an exceedingly well-defined format (UAX #42) sitting atop one of the most widely used base formats ever (XML), with all property information in a single repository (per PUA scheme), would be great encouragement. I've devised lots of novel file formats and I think this is one use case where that would be a real hindrance. Storing this information in a font, by hook or crook, would lock users of those PUA characters into that font. At that rate, you might as well use ASCII-hacked fonts, as we did 25 years ago. -- Doug Ewell | Thornton, CO, US | ewellic.org