Re: Unicode for words?

Richard Cook Tue, 07 Dec 2004 03:05:33 -0800

On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:

A word-based encoding for English could automatically assume spaces
where they are appropriate.  The sentence:

"What means this, my lord?"

would have seven encodable elements: the five words, the comma, and the
question mark.  Spaces would be automatically filled in as needed, not
explicitly encoded.  This implies "standard" English punctuation and
spacing conventions, however that is defined.  For French conventions,
there would probably be a space before the question mark as well.

Well, why stop with words, my lord? Why not just encode all sentences, paragraphs, pages, chapters, books, libraries, or your higher level unit of choice, for that matter.

For example, in my library, the single code point U+100000 happens to contain hi-res color images of all pages of an edition of Moby Dick that I happen to like very much.

Or consider an image-based encoding, which joins standard text to image. Images of the text to be encoded are indexed using some private indexing scheme, and the index elements are then mapped to a standard encoding. The relatively lo-res standard encoding (which must necessarily collapse some distinctions that are less generally important), is augmented with hi-res indexing of images of the specific text to be digitized.

Whether you choose to associate a single glyph with your private-use code point, or an entire book, why, that's up to you (and your software).

Re: Unicode for words?

Reply via email to